Erroff: a tool to identify and correct real-word errors in sinhala documents
dc.contributor.author | Sudesh, P | |
dc.contributor.author | Dashintha, D | |
dc.contributor.author | Lakshan, R | |
dc.contributor.author | Dias, G | |
dc.contributor.editor | Rathnayake, M | |
dc.contributor.editor | Adhikariwatte, V | |
dc.contributor.editor | Hemachandra, K | |
dc.date.accessioned | 2022-10-27T09:44:47Z | |
dc.date.available | 2022-10-27T09:44:47Z | |
dc.date.issued | 2022-07 | |
dc.description.abstract | Sinhala is a low-resource Indo-Aryan language used by approximately 16 million people, mainly in Sri Lanka. Because of the complexity of the Sinhala language, detection of spelling errors is not so easy. A real-word error happens when a word is in the vocabulary but is not valid in the context in which it appears. Checking for real-word errors in a sentence is more difficult than checking for non-word errors, which are not in the vocabulary. We present the implementation of a neural-network based system for identifying real-word errors and non-word errors in Sinhala. We prepared a candidate list of real-word errors. Further, we have selected a suitable model and trained it using several different datasets. Thus, this paper sets a new baseline for the detection and correction of real-word errors in Sinhala documents. Our product, source code, candidate error list, training datasets, and evaluation dataset are publicly released. | en_US |
dc.identifier.citation | P. Sudesh, D. Dashintha, R. Lakshan and G. Dias, "Erroff: A Tool to Identify and Correct Real-word Errors in Sinhala Documents," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-6, doi: 10.1109/MERCon55799.2022.9906294. | en_US |
dc.identifier.conference | Moratuwa Engineering Research Conference 2022 | en_US |
dc.identifier.department | Engineering Research Unit, University of Moratuwa | en_US |
dc.identifier.doi | 10.1109/MERCon55799.2022.9906294 | en_US |
dc.identifier.email | sudeshdilshan.17@cse.mrt.ac.lk | |
dc.identifier.email | dashinthadilan.17@cse.mrt.ac.lk | |
dc.identifier.email | rashnanayakkara.17@cse.mrt.ac.lk | |
dc.identifier.email | gihan@uom.lk | |
dc.identifier.faculty | Engineering | en_US |
dc.identifier.place | Moratuwa, Sri Lanka | en_US |
dc.identifier.proceeding | Proceedings of Moratuwa Engineering Research Conference 2022 | en_US |
dc.identifier.uri | http://dl.lib.uom.lk/handle/123/19274 | |
dc.identifier.year | 2022 | en_US |
dc.language.iso | en | en_US |
dc.publisher | IEEE | en_US |
dc.relation.uri | https://ieeexplore.ieee.org/document/9906294 | en_US |
dc.subject | Sinhala | en_US |
dc.subject | NLP | en_US |
dc.subject | Real-word errors | en_US |
dc.subject | Spell checker | en_US |
dc.title | Erroff: a tool to identify and correct real-word errors in sinhala documents | en_US |
dc.type | Conference-Full-text | en_US |