Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil

Farhath, F; Ranathunga, S; Jayasena, S; Dias, G

Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil

Date

2018-05

Authors

Publisher

IEEE

Abstract

Availability of quality parallel data is a major requirement to build a reasonably well performing statistical machine translation (SMT) system. Thus, developing a decent SMT system for a low-resourced language pair like Sinhala and Tamil that does not have a large parallel corpus is rather challenging. Past research for other different language pairs has shown that different terminology / bilingual list integration methodologies can be used to improve the quality of SMT systems, for domain-specific SMT in particular. In this paper, we explore if this can be effective for Sinhala-Tamil machine translation for the domain of official government documents. We evaluate the impact of three types of bilingual lists, namely, a list of government organizations and official designations, a glossary related to government administration and operations, and a general bilingual dictionary, based on four different methodologies (three static and one dynamic). Out of four, one methodology gave notable improvements for all three types of list over the baseline.

Keywords

statistical machine translation, Sinhala, Tamil, low-resourced, terminology integration

Citation

F. Farhath, S. Ranathunga, S. Jayasena and G. Dias, "Integration of Bilingual Lists for Domain-Specific Statistical Machine Translation for Sinhala-Tamil," 2018 Moratuwa Engineering Research Conference (MERCon), 2018, pp. 538-543, doi: 10.1109/MERCon.2018.8421901.

URI

http://dl.lib.uom.lk/handle/123/18646

DOI

10.1109/MERCon.2018.8421901

Collections

MERCon - 2018

Full item page

Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By