Institutional-Repository, University of Moratuwa.  

Erroff: a tool to identify and correct real-word errors in sinhala documents

Show simple item record

dc.contributor.author Sudesh, P
dc.contributor.author Dashintha, D
dc.contributor.author Lakshan, R
dc.contributor.author Dias, G
dc.contributor.editor Rathnayake, M
dc.contributor.editor Adhikariwatte, V
dc.contributor.editor Hemachandra, K
dc.date.accessioned 2022-10-27T09:44:47Z
dc.date.available 2022-10-27T09:44:47Z
dc.date.issued 2022-07
dc.identifier.citation P. Sudesh, D. Dashintha, R. Lakshan and G. Dias, "Erroff: A Tool to Identify and Correct Real-word Errors in Sinhala Documents," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-6, doi: 10.1109/MERCon55799.2022.9906294. en_US
dc.identifier.uri http://dl.lib.uom.lk/handle/123/19274
dc.description.abstract Sinhala is a low-resource Indo-Aryan language used by approximately 16 million people, mainly in Sri Lanka. Because of the complexity of the Sinhala language, detection of spelling errors is not so easy. A real-word error happens when a word is in the vocabulary but is not valid in the context in which it appears. Checking for real-word errors in a sentence is more difficult than checking for non-word errors, which are not in the vocabulary. We present the implementation of a neural-network based system for identifying real-word errors and non-word errors in Sinhala. We prepared a candidate list of real-word errors. Further, we have selected a suitable model and trained it using several different datasets. Thus, this paper sets a new baseline for the detection and correction of real-word errors in Sinhala documents. Our product, source code, candidate error list, training datasets, and evaluation dataset are publicly released. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.uri https://ieeexplore.ieee.org/document/9906294 en_US
dc.subject Sinhala en_US
dc.subject NLP en_US
dc.subject Real-word errors en_US
dc.subject Spell checker en_US
dc.title Erroff: a tool to identify and correct real-word errors in sinhala documents en_US
dc.type Conference-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.department Engineering Research Unit, University of Moratuwa en_US
dc.identifier.year 2022 en_US
dc.identifier.conference Moratuwa Engineering Research Conference 2022 en_US
dc.identifier.place Moratuwa, Sri Lanka en_US
dc.identifier.proceeding Proceedings of Moratuwa Engineering Research Conference 2022 en_US
dc.identifier.email sudeshdilshan.17@cse.mrt.ac.lk
dc.identifier.email dashinthadilan.17@cse.mrt.ac.lk
dc.identifier.email rashnanayakkara.17@cse.mrt.ac.lk
dc.identifier.email gihan@uom.lk
dc.identifier.doi 10.1109/MERCon55799.2022.9906294 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record