Institutional-Repository, University of Moratuwa.  

Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil

Show simple item record

dc.contributor.author Farhath, F
dc.contributor.author Ranathunga, S
dc.contributor.author Jayasena, S
dc.contributor.author Dias, G
dc.contributor.editor Chathuranga, D
dc.date.accessioned 2022-08-16T04:34:59Z
dc.date.available 2022-08-16T04:34:59Z
dc.date.issued 2018-05
dc.identifier.citation F. Farhath, S. Ranathunga, S. Jayasena and G. Dias, "Integration of Bilingual Lists for Domain-Specific Statistical Machine Translation for Sinhala-Tamil," 2018 Moratuwa Engineering Research Conference (MERCon), 2018, pp. 538-543, doi: 10.1109/MERCon.2018.8421901. en_US
dc.identifier.uri http://dl.lib.uom.lk/handle/123/18646
dc.description.abstract Availability of quality parallel data is a major requirement to build a reasonably well performing statistical machine translation (SMT) system. Thus, developing a decent SMT system for a low-resourced language pair like Sinhala and Tamil that does not have a large parallel corpus is rather challenging. Past research for other different language pairs has shown that different terminology / bilingual list integration methodologies can be used to improve the quality of SMT systems, for domain-specific SMT in particular. In this paper, we explore if this can be effective for Sinhala-Tamil machine translation for the domain of official government documents. We evaluate the impact of three types of bilingual lists, namely, a list of government organizations and official designations, a glossary related to government administration and operations, and a general bilingual dictionary, based on four different methodologies (three static and one dynamic). Out of four, one methodology gave notable improvements for all three types of list over the baseline. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.uri https://ieeexplore.ieee.org/document/8421901 en_US
dc.subject statistical machine translation en_US
dc.subject Sinhala, Tamil en_US
dc.subject low-resourced en_US
dc.subject terminology integration en_US
dc.title Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil en_US
dc.type Conference-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.department Engineering Research Unit, University of Moratuwa en_US
dc.identifier.year 2018 en_US
dc.identifier.conference 2018 Moratuwa Engineering Research Conference (MERCon) en_US
dc.identifier.pgnos pp. 538-543 en_US
dc.identifier.proceeding Proceedings of 2018 Moratuwa Engineering Research Conference (MERCon) en_US
dc.identifier.email fathimafarhath@cse.mrt.ac.lk en_US
dc.identifier.email surangika@cse.mrt.ac.lk en_US
dc.identifier.email sanath@cse.mrt.ac.lk en_US
dc.identifier.email gihan@cse.mrt.ac.lk en_US
dc.identifier.doi 10.1109/MERCon.2018.8421901 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record