Monolingual sentence similarity measurement using siamese neural networks for Sinhala and Tamil languages

dc.contributor.advisorRanathunga S
dc.contributor.authorSatkunanantham N
dc.date.accept2021
dc.date.accessioned2021
dc.date.available2021
dc.date.issued2021
dc.description.abstractSentence similarity plays a key role in text-processing related research such as plagiarism checking and paraphrasing. So far, only conventional unsupervised sentence similarity techniques such as string-based, corpus-based, knowledge-based, and hybrid approaches have been used to measure sentence similarity for Tamil and Sinhala languages. In this research, we introduce a Deep Learning methodology to measure sentence similarity for these two languages, which makes use of Siamese Recurrent Neural Networks techniques together with a word-embedding model as the input representation. This approach achieved a 3.07% higher Pearson correlation coefficient for the Tamil dataset of 2500 sentence pairs and a 3.61% higher Pearson correlation coefficient for the Sinhala dataset of 5000 sentence pairs. Both these results outperform that of the conventional unsupervised sentence similarity techniques applied on the same datasets.en_US
dc.identifier.accnoTH4661en_US
dc.identifier.citationSatkunanantham, N. (2021). Monolingual sentence similarity measurement using siamese neural networks for Sinhala and Tamil languages [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/20465
dc.identifier.degreeMSc in Computer Science and Engineeringen_US
dc.identifier.departmentDepartment of Computer Science & Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/20465
dc.language.isoenen_US
dc.subjectSENTENCE-SIMILARITYen_US
dc.subjectSINHALA, TAMILen_US
dc.subjectSIAMESE NEURAL NETWORKen_US
dc.subjectLSTMen_US
dc.subjectDEEP-LEARNINGen_US
dc.subjectFASTTEXTen_US
dc.subjectNATURAL LANGUAGE PROCESSINGen_US
dc.subjectCOMPUTER SCIENCE - Dissertationen_US
dc.subjectCOMPUTER SCIENCE & ENGINEERING - Dissertationen_US
dc.subjectINFORMATION TECHNOLOGY – Dissertationen_US
dc.titleMonolingual sentence similarity measurement using siamese neural networks for Sinhala and Tamil languagesen_US
dc.typeThesis-Abstracten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH4661-1.pdf
Size:
213.2 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH4661-2.pdf
Size:
143.77 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH4661.pdf
Size:
2.39 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis