Show simple item record

dc.contributor.author Hameed, RA
dc.contributor.author Pathirennehelage, N
dc.contributor.author Ihalapathirana, A
dc.contributor.author Mohamed, MZ
dc.contributor.author Ranathunga, VSD
dc.contributor.author Jayasena, S
dc.contributor.author Dias, G
dc.contributor.author Fernando, S
dc.date.accessioned 2017-01-16T04:01:11Z
dc.date.available 2017-01-16T04:01:11Z
dc.identifier.uri http://dl.lib.mrt.ac.lk/handle/123/12221
dc.description.abstract A sentence aligned parallel corpus is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. Automatic creation of a sentence aligned parallel corpus using parallel text is the solution to this problem. In this paper, we present the first ever empirical evaluation carried out to identify the best method to automatically create a sentence aligned Sinhala-Tamil parallel corpus. Annual reports from Sri Lankan government institutions were used as the parallel text for aligning. Despite both Sinhala and Tamil being under-resourced languages, we were able to achieve an F-score value of 0.791 using a hybrid approach that makes use of a bilingual dictionary. en_US
dc.relation.uri http://www.aclweb.org/anthology/W/W16/W16-37.pdf en_US
dc.source.uri http://www.aclweb.org/anthology/W/W16/W16-37.pdf en_US
dc.title Automatic Creation of a Sentence Aligned Sinhala-Tamil Parallel Corpus en_US
dc.type Article-Abstract en_US
dc.identifier.year 2016 en_US
dc.identifier.journal WSSANLP en_US
dc.identifier.pgnos 124 en_US
dc.identifier.email gihan@uom.lk en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record