Sinhala code-mixed text translation using neural machine translation

dc.contributor.advisorSumathipala S
dc.contributor.advisorSilva T
dc.contributor.authorArchchana, K
dc.date.accept2024
dc.date.accessioned2024-10-10T07:59:10Z
dc.date.available2024-10-10T07:59:10Z
dc.date.issued2024
dc.description.abstractMixing two or more languages together in communication is called as code-mixing. In South Asian communities it has become famous due to bilingualism or multilingualism. Sinhala-English code-mixed(SECM) text is the most popular language used in Sri Lanka in casual talks such as social media comments, posts, chats, etc. On social media platforms, the contents such as posts and comments are used for personalized advertisement recommendations, post recommendations, interesting content recommendations, etc., to provide better customer service according to their interest. Due to the code-mixing nature of the language, most of the Srilankan social media content is unused for recommendation purposes. So our research study mainly focuses on translating the SECM text to the Sinhala language. Once the contents are converted to a standard language, the social media contents can be processed easily and used for the necessary purposes. In this research, we initially conduct an in-depth analysis of Sinhala-English code-mixed. Issues that are considered as barriers to translate the SECM to Sinhala are identified. Also, we conducted a thorough literature study of code-mixed text analysis. An SECM-Sinhala parallel corpus with 5000 parallel sentences are used for this research study. The approach proposed for the SECM to Sinhala translation consists of a normalization layer, Encoder-Decoder framework(Seq2Seq), LSTM and Teacher Forcing mechanism. We evaluated our proposed approach with other translation approaches proposed for code-mixed text translation, and our approach gave a significantly higher BLEU score. Key words Code-mixing, Bilingualism, Multilingualism, LSTM, Teacher Forcingen_US
dc.identifier.accnoTH5542en_US
dc.identifier.citationArchchana, K. (2024). Sinhala code-mixed text translation using neural machine translation [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22898
dc.identifier.degreeMaster of Philosophy (MPhil)en_US
dc.identifier.departmentDepartment of Computational Mathematicsen_US
dc.identifier.facultyITen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/22898
dc.language.isoenen_US
dc.subjectCODE-MIXING
dc.subjectMULTILINGUALISM
dc.subjectLSTM | TEACHER FORCING
dc.subjectCOMPUTATIONAL MATHEMATICS– Dissertation
dc.subjectMaster of Philosophy (MPhil)
dc.titleSinhala code-mixed text translation using neural machine translationen_US
dc.typeThesis-Full-texten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5542-1.pdf
Size:
195.88 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH5542-2.pdf
Size:
148.44 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH5542.pdf
Size:
3.74 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: