Automatic Creation of a Sentence Aligned Sinhala-Tamil Parallel Corpus

Hameed, RA; Pathirennehelage, N; Ihalapathirana, A; Mohamed, MZ; Ranathunga, VSD; Jayasena, S; Dias, G; Fernando, S

Automatic Creation of a Sentence Aligned Sinhala-Tamil Parallel Corpus

Authors

Abstract

A sentence aligned parallel corpus is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. Automatic creation of a sentence aligned parallel corpus using parallel text is the solution to this problem. In this paper, we present the first ever empirical evaluation carried out to identify the best method to automatically create a sentence aligned Sinhala-Tamil parallel corpus. Annual reports from Sri Lankan government institutions were used as the parallel text for aligning. Despite both Sinhala and Tamil being under-resourced languages, we were able to achieve an F-score value of 0.791 using a hybrid approach that makes use of a bilingual dictionary.

URI

http://dl.lib.mrt.ac.lk/handle/123/12221

Collections

Articles authored by UoM staff

Full item page

Automatic Creation of a Sentence Aligned Sinhala-Tamil Parallel Corpus

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By