Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil

dc.contributor.authorFarhath, F
dc.contributor.authorRanathunga, S
dc.contributor.authorJayasena, S
dc.contributor.authorDias, G
dc.contributor.editorChathuranga, D
dc.date.accessioned2022-08-16T04:34:59Z
dc.date.available2022-08-16T04:34:59Z
dc.date.issued2018-05
dc.description.abstractAvailability of quality parallel data is a major requirement to build a reasonably well performing statistical machine translation (SMT) system. Thus, developing a decent SMT system for a low-resourced language pair like Sinhala and Tamil that does not have a large parallel corpus is rather challenging. Past research for other different language pairs has shown that different terminology / bilingual list integration methodologies can be used to improve the quality of SMT systems, for domain-specific SMT in particular. In this paper, we explore if this can be effective for Sinhala-Tamil machine translation for the domain of official government documents. We evaluate the impact of three types of bilingual lists, namely, a list of government organizations and official designations, a glossary related to government administration and operations, and a general bilingual dictionary, based on four different methodologies (three static and one dynamic). Out of four, one methodology gave notable improvements for all three types of list over the baseline.en_US
dc.identifier.citationF. Farhath, S. Ranathunga, S. Jayasena and G. Dias, "Integration of Bilingual Lists for Domain-Specific Statistical Machine Translation for Sinhala-Tamil," 2018 Moratuwa Engineering Research Conference (MERCon), 2018, pp. 538-543, doi: 10.1109/MERCon.2018.8421901.en_US
dc.identifier.conference2018 Moratuwa Engineering Research Conference (MERCon)en_US
dc.identifier.departmentEngineering Research Unit, University of Moratuwaen_US
dc.identifier.doi10.1109/MERCon.2018.8421901en_US
dc.identifier.emailfathimafarhath@cse.mrt.ac.lken_US
dc.identifier.emailsurangika@cse.mrt.ac.lken_US
dc.identifier.emailsanath@cse.mrt.ac.lken_US
dc.identifier.emailgihan@cse.mrt.ac.lken_US
dc.identifier.facultyEngineeringen_US
dc.identifier.pgnospp. 538-543en_US
dc.identifier.proceedingProceedings of 2018 Moratuwa Engineering Research Conference (MERCon)en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/18646
dc.identifier.year2018en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.urihttps://ieeexplore.ieee.org/document/8421901en_US
dc.subjectstatistical machine translationen_US
dc.subjectSinhala, Tamilen_US
dc.subjectlow-resourceden_US
dc.subjectterminology integrationen_US
dc.titleIntegration of bilingual lists for domain-specific statistical machine translation for sinhala-tamilen_US
dc.typeConference-Full-texten_US

Files

Collections