Automatic Creation of a Sentence Aligned Sinhala-Tamil Parallel Corpus

dc.contributor.authorHameed, RA
dc.contributor.authorPathirennehelage, N
dc.contributor.authorIhalapathirana, A
dc.contributor.authorMohamed, MZ
dc.contributor.authorRanathunga, VSD
dc.contributor.authorJayasena, S
dc.contributor.authorDias, G
dc.contributor.authorFernando, S
dc.date.accessioned2017-01-16T04:01:11Z
dc.date.available2017-01-16T04:01:11Z
dc.description.abstractA sentence aligned parallel corpus is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. Automatic creation of a sentence aligned parallel corpus using parallel text is the solution to this problem. In this paper, we present the first ever empirical evaluation carried out to identify the best method to automatically create a sentence aligned Sinhala-Tamil parallel corpus. Annual reports from Sri Lankan government institutions were used as the parallel text for aligning. Despite both Sinhala and Tamil being under-resourced languages, we were able to achieve an F-score value of 0.791 using a hybrid approach that makes use of a bilingual dictionary.en_US
dc.identifier.emailgihan@uom.lken_US
dc.identifier.journalWSSANLPen_US
dc.identifier.pgnos124en_US
dc.identifier.urihttp://dl.lib.mrt.ac.lk/handle/123/12221
dc.identifier.year2016en_US
dc.relation.urihttp://www.aclweb.org/anthology/W/W16/W16-37.pdfen_US
dc.source.urihttp://www.aclweb.org/anthology/W/W16/W16-37.pdfen_US
dc.titleAutomatic Creation of a Sentence Aligned Sinhala-Tamil Parallel Corpusen_US
dc.typeArticle-Abstracten_US

Files