Institutional-Repository, University of Moratuwa.  

Transliteration and byte pair encoding to improve Tamil to Sinhala neural machine translation

Show simple item record

dc.contributor.author Tennage, P
dc.contributor.author Herath, A
dc.contributor.author Thilakarathne, M
dc.contributor.author Sandaruwan, P
dc.contributor.author Ranathunga, S
dc.contributor.editor Chathuranga, D
dc.date.accessioned 2022-08-24T04:50:31Z
dc.date.available 2022-08-24T04:50:31Z
dc.date.issued 2018-05
dc.identifier.citation P. Tennage, A. Herath, M. Thilakarathne, P. Sandaruwan and S. Ranathunga, "Transliteration and Byte Pair Encoding to Improve Tamil to Sinhala Neural Machine Translation," 2018 Moratuwa Engineering Research Conference (MERCon), 2018, pp. 390-395, doi: 10.1109/MERCon.2018.8421939. en_US
dc.identifier.uri http://dl.lib.uom.lk/handle/123/18694
dc.description.abstract Neural Machine Translation (NMT) is the current state-of-the-art machine translation technique. However, applicability of NMT for language pairs that have high morphological variations is still debatable. Lack of language resources, especially a sufficiently large parallel corpus causes additional issues, which leads to very poor translation performance, when NMT is applied to languages with high morphological variations. In this paper, we present three techniques to improve domain-specific NMT performance of the under-resourced language pair Sinhala and Tamil that have high morphological variations. Out of these three techniques, transliteration is a novel approach to improve domain-specific NMT performance for language pairs such as Sinhala and Tamil that share a common grammatical structure and have moderate lexical similarity. We built the first transliteration system for Sinhala to English and Tamil to English, which provided an accuracy of 99.6%, when tested with the parallel corpus we used for NMT training. The other technique we employed is Byte Pair Encoding (BPE), which is a technique that has been used to achieve open vocabulary translation with a fixed vocabulary of subword symbols. Our experiments show that while the translation based on independent BPE models and pure transliteration perform moderately, integrating transliteration to build a joint BPE model for the aforementioned language pair increases the translation quality by 1.68 BLEU score. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.uri https://ieeexplore.ieee.org/document/8421939 en_US
dc.subject neural machine translation en_US
dc.subject transliteration en_US
dc.subject byte pair encoding en_US
dc.subject sinhala en_US
dc.subject tamil en_US
dc.title Transliteration and byte pair encoding to improve Tamil to Sinhala neural machine translation en_US
dc.type Conference-Full-text en_US
dc.identifier.faculty Engineering
dc.identifier.department Engineering Research Unit, University of Moratuwa en_US
dc.identifier.year 2018 en_US
dc.identifier.conference 2018 Moratuwa Engineering Research Conference (MERCon) en_US
dc.identifier.pgnos pp. 390-395 en_US
dc.identifier.proceeding Proceedings of 2018 Moratuwa Engineering Research Conference (MERCon) en_US
dc.identifier.email pasindu.13@cse.mrt.ac.lk en_US
dc.identifier.email narmada.ah.13@cse.mrt.ac.lk en_US
dc.identifier.email malith.13@cse.mrt.ac.lk en_US
dc.identifier.email prabath.sandaruwan.13@cse.mrt.ac.lk en_US
dc.identifier.email surangika@cse.mrt.ac.lk en_US
dc.identifier.doi 10.1109/MERCon.2018.8421939 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record