Neural machine translation for low-resourced languages: Sinhala &Tamil [abstract]

dc.contributor.authorThayasivam, U
dc.date.accessioned2025-07-23T05:29:26Z
dc.date.issued2019
dc.descriptionThe following papers were published based on the results of this research project. [1] Naranpanawa, R., Perera, R., Fonseka, T., & Thayasivam, U. (2020). Analyzing subword techniques toimprove english to sinhala neural machine translation. /nternational Journal of Asian Language Processing, 30(04), 2050017. hitps://doiorg/10. 1142/S2717554520500174 2] T. Fonseka, R. Naranpanawa, R. Perera and U. Thayasivam, "English to Sinhala Neural Machine Translation," 2020 International Conference on Asian Language Processing (IALP), Kuala Lumpur, Malaysia, 2020, pp. 305 -309, doi: 10.1109/IALP51396.2020.93 10462
dc.description.abstractNeural Machine Translation (NMT) has emerged as a cutting-edge technology, particularly impactful for resource-rich languages. However, its limitations in low-resource settings are addressed in this research, focusing on the Sinhala language in Sri Lanka. Despite Sinhala's prevalence, low English proficiency necessitates high-quality translations for official government documents. In Sri Lanka, Sinhala is the primary language, and the English competency of Sri Lankans is below average. Thus, translating English content to Sinhala has become an essential requirement. This study introduces an NMT system with Byte Pair Encoding (BPE), tailored for the English-Sinhala pair, emphasizing improved translation accuracy for Sri Lankan official documents. Beyond addressing NMT challenges, the research extends to the intricacies of low-resource, morphologically rich languages like Sinhala. While standard NMT surpasses Statistical Machine Translation with ample parallel corpus, low-resource languages face out-of-vocabulary (OOV) and rare word challenges. This research further investigated various sub-word techniques and empirically found that using sub-word techniques helps improve translation quality. This study uses a state-of-the-art English-Sinhala translation system with transformer architecture to explore sub-word techniques to alleviate OOV and rare word problems. Our experiments demonstrated how BPE can be incorporated to address the OOV problem in morphologically rich languages. Our models further demonstrate that sub word segmentation strategies and the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.
dc.description.sponsorshipSenate Research Committee
dc.identifier.accnoSRC204
dc.identifier.srgnoSRC/LT/2019/29
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/23914
dc.language.isoen
dc.subjectSENATE RESEARCH COMMITTEE – Research Report
dc.subjectNEURAL MACHINE TRANSLATION
dc.subjectLOW-RESOURCED LANGUAGES-Sinhala
dc.subjectTamil
dc.titleNeural machine translation for low-resourced languages: Sinhala &Tamil [abstract]
dc.typeSRC-Report

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SRC204 - Dr. Uthyasanker SRCLT201929 Closng.pdf
Size:
971.2 KB
Format:
Adobe Portable Document Format
Description:
SRC Report

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: