Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil
| dc.contributor.author | Farhath, F | |
| dc.contributor.author | Ranathunga, S | |
| dc.contributor.author | Jayasena, S | |
| dc.contributor.author | Dias, G | |
| dc.contributor.editor | Chathuranga, D | |
| dc.date.accessioned | 2022-08-16T04:34:59Z | |
| dc.date.available | 2022-08-16T04:34:59Z | |
| dc.date.issued | 2018-05 | |
| dc.description.abstract | Availability of quality parallel data is a major requirement to build a reasonably well performing statistical machine translation (SMT) system. Thus, developing a decent SMT system for a low-resourced language pair like Sinhala and Tamil that does not have a large parallel corpus is rather challenging. Past research for other different language pairs has shown that different terminology / bilingual list integration methodologies can be used to improve the quality of SMT systems, for domain-specific SMT in particular. In this paper, we explore if this can be effective for Sinhala-Tamil machine translation for the domain of official government documents. We evaluate the impact of three types of bilingual lists, namely, a list of government organizations and official designations, a glossary related to government administration and operations, and a general bilingual dictionary, based on four different methodologies (three static and one dynamic). Out of four, one methodology gave notable improvements for all three types of list over the baseline. | en_US |
| dc.identifier.citation | F. Farhath, S. Ranathunga, S. Jayasena and G. Dias, "Integration of Bilingual Lists for Domain-Specific Statistical Machine Translation for Sinhala-Tamil," 2018 Moratuwa Engineering Research Conference (MERCon), 2018, pp. 538-543, doi: 10.1109/MERCon.2018.8421901. | en_US |
| dc.identifier.conference | 2018 Moratuwa Engineering Research Conference (MERCon) | en_US |
| dc.identifier.department | Engineering Research Unit, University of Moratuwa | en_US |
| dc.identifier.doi | 10.1109/MERCon.2018.8421901 | en_US |
| dc.identifier.email | fathimafarhath@cse.mrt.ac.lk | en_US |
| dc.identifier.email | surangika@cse.mrt.ac.lk | en_US |
| dc.identifier.email | sanath@cse.mrt.ac.lk | en_US |
| dc.identifier.email | gihan@cse.mrt.ac.lk | en_US |
| dc.identifier.faculty | Engineering | en_US |
| dc.identifier.pgnos | pp. 538-543 | en_US |
| dc.identifier.proceeding | Proceedings of 2018 Moratuwa Engineering Research Conference (MERCon) | en_US |
| dc.identifier.uri | http://dl.lib.uom.lk/handle/123/18646 | |
| dc.identifier.year | 2018 | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | IEEE | en_US |
| dc.relation.uri | https://ieeexplore.ieee.org/document/8421901 | en_US |
| dc.subject | statistical machine translation | en_US |
| dc.subject | Sinhala, Tamil | en_US |
| dc.subject | low-resourced | en_US |
| dc.subject | terminology integration | en_US |
| dc.title | Integration of bilingual lists for domain-specific statistical machine translation for sinhala-tamil | en_US |
| dc.type | Conference-Full-text | en_US |
