Erroff: a tool to identify and correct real-word errors in sinhala documents

dc.contributor.authorSudesh, P
dc.contributor.authorDashintha, D
dc.contributor.authorLakshan, R
dc.contributor.authorDias, G
dc.contributor.editorRathnayake, M
dc.contributor.editorAdhikariwatte, V
dc.contributor.editorHemachandra, K
dc.date.accessioned2022-10-27T09:44:47Z
dc.date.available2022-10-27T09:44:47Z
dc.date.issued2022-07
dc.description.abstractSinhala is a low-resource Indo-Aryan language used by approximately 16 million people, mainly in Sri Lanka. Because of the complexity of the Sinhala language, detection of spelling errors is not so easy. A real-word error happens when a word is in the vocabulary but is not valid in the context in which it appears. Checking for real-word errors in a sentence is more difficult than checking for non-word errors, which are not in the vocabulary. We present the implementation of a neural-network based system for identifying real-word errors and non-word errors in Sinhala. We prepared a candidate list of real-word errors. Further, we have selected a suitable model and trained it using several different datasets. Thus, this paper sets a new baseline for the detection and correction of real-word errors in Sinhala documents. Our product, source code, candidate error list, training datasets, and evaluation dataset are publicly released.en_US
dc.identifier.citationP. Sudesh, D. Dashintha, R. Lakshan and G. Dias, "Erroff: A Tool to Identify and Correct Real-word Errors in Sinhala Documents," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-6, doi: 10.1109/MERCon55799.2022.9906294.en_US
dc.identifier.conferenceMoratuwa Engineering Research Conference 2022en_US
dc.identifier.departmentEngineering Research Unit, University of Moratuwaen_US
dc.identifier.doi10.1109/MERCon55799.2022.9906294en_US
dc.identifier.emailsudeshdilshan.17@cse.mrt.ac.lk
dc.identifier.emaildashinthadilan.17@cse.mrt.ac.lk
dc.identifier.emailrashnanayakkara.17@cse.mrt.ac.lk
dc.identifier.emailgihan@uom.lk
dc.identifier.facultyEngineeringen_US
dc.identifier.placeMoratuwa, Sri Lankaen_US
dc.identifier.proceedingProceedings of Moratuwa Engineering Research Conference 2022en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/19274
dc.identifier.year2022en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.urihttps://ieeexplore.ieee.org/document/9906294en_US
dc.subjectSinhalaen_US
dc.subjectNLPen_US
dc.subjectReal-word errorsen_US
dc.subjectSpell checkeren_US
dc.titleErroff: a tool to identify and correct real-word errors in sinhala documentsen_US
dc.typeConference-Full-texten_US

Files

Collections