Show simple item record

dc.contributor.advisor Sumathipala S
dc.contributor.advisor Silva T
dc.contributor.author Archchana, K
dc.date.accessioned 2024-10-10T07:59:10Z
dc.date.available 2024-10-10T07:59:10Z
dc.date.issued 2024
dc.identifier.citation Archchana, K. (2024). Sinhala code-mixed text translation using neural machine translation [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22898
dc.identifier.uri http://dl.lib.uom.lk/handle/123/22898
dc.description.abstract Mixing two or more languages together in communication is called as code-mixing. In South Asian communities it has become famous due to bilingualism or multilingualism. Sinhala-English code-mixed(SECM) text is the most popular language used in Sri Lanka in casual talks such as social media comments, posts, chats, etc. On social media platforms, the contents such as posts and comments are used for personalized advertisement recommendations, post recommendations, interesting content recommendations, etc., to provide better customer service according to their interest. Due to the code-mixing nature of the language, most of the Srilankan social media content is unused for recommendation purposes. So our research study mainly focuses on translating the SECM text to the Sinhala language. Once the contents are converted to a standard language, the social media contents can be processed easily and used for the necessary purposes. In this research, we initially conduct an in-depth analysis of Sinhala-English code-mixed. Issues that are considered as barriers to translate the SECM to Sinhala are identified. Also, we conducted a thorough literature study of code-mixed text analysis. An SECM-Sinhala parallel corpus with 5000 parallel sentences are used for this research study. The approach proposed for the SECM to Sinhala translation consists of a normalization layer, Encoder-Decoder framework(Seq2Seq), LSTM and Teacher Forcing mechanism. We evaluated our proposed approach with other translation approaches proposed for code-mixed text translation, and our approach gave a significantly higher BLEU score. Key words Code-mixing, Bilingualism, Multilingualism, LSTM, Teacher Forcing en_US
dc.language.iso en en_US
dc.subject CODE-MIXING
dc.subject MULTILINGUALISM
dc.subject LSTM | TEACHER FORCING
dc.subject COMPUTATIONAL MATHEMATICS– Dissertation
dc.subject Master of Philosophy (MPhil)
dc.title Sinhala code-mixed text translation using neural machine translation en_US
dc.type Thesis-Full-text en_US
dc.identifier.faculty IT en_US
dc.identifier.degree Master of Philosophy (MPhil) en_US
dc.identifier.department Department of Computational Mathematics en_US
dc.date.accept 2024
dc.identifier.accno TH5542 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record