Erroff: a tool to identify and correct real-word errors in sinhala documents

Loading...
Thumbnail Image

Date

2022-07

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

Sinhala is a low-resource Indo-Aryan language used by approximately 16 million people, mainly in Sri Lanka. Because of the complexity of the Sinhala language, detection of spelling errors is not so easy. A real-word error happens when a word is in the vocabulary but is not valid in the context in which it appears. Checking for real-word errors in a sentence is more difficult than checking for non-word errors, which are not in the vocabulary. We present the implementation of a neural-network based system for identifying real-word errors and non-word errors in Sinhala. We prepared a candidate list of real-word errors. Further, we have selected a suitable model and trained it using several different datasets. Thus, this paper sets a new baseline for the detection and correction of real-word errors in Sinhala documents. Our product, source code, candidate error list, training datasets, and evaluation dataset are publicly released.

Description

Citation

P. Sudesh, D. Dashintha, R. Lakshan and G. Dias, "Erroff: A Tool to Identify and Correct Real-word Errors in Sinhala Documents," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-6, doi: 10.1109/MERCon55799.2022.9906294.

Collections

Endorsement

Review

Supplemented By

Referenced By