Monolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languages

dc.contributor.authorNilaxan, S
dc.contributor.authorRanathunga, S
dc.contributor.editorAdhikariwatte, W
dc.contributor.editorRathnayake, M
dc.contributor.editorHemachandra, K
dc.date.accessioned2022-10-19T05:49:35Z
dc.date.available2022-10-19T05:49:35Z
dc.date.issued2021-07
dc.description.abstractSentence similarity is useful in many Natural Language Processing tasks such as plagiarism checking and paraphrasing. So far, only conventional unsupervised sentence similarity measurement techniques (knowledge-based, corpus-based, string similarity-based, and hybrid) have been used to measure sentence similarity for Tamil and Sinhala languages. In this paper, we present a Deep Learning technique to measure sentence similarity for these two languages, which makes use of a Siamese Neural Network that consists of two Long Short-Term Memory (LSTM) networks, and neural word embeddings as the input representation. This approach achieved a 3.07% higher Pearson correlation coefficient for the dataset of 2500 Tamil sentence pairs, and a 3.61% higher Pearson correlation for the dataset of 5000 Sinhala sentence pairs over the conventional unsupervised sentence similarity measurement techniques.en_US
dc.identifier.citationS. Nilaxan and S. Ranathunga, "Monolingual Sentence Similarity Measurement using Siamese Neural Networks for Sinhala and Tamil Languages," 2021 Moratuwa Engineering Research Conference (MERCon), 2021, pp. 567-572, doi: 10.1109/MERCon52712.2021.9525786.en_US
dc.identifier.conferenceMoratuwa Engineering Research Conference 2021en_US
dc.identifier.departmentEngineering Research Unit, University of Moratuwaen_US
dc.identifier.doi10.1109/MERCon52712.2021.9525786en_US
dc.identifier.facultyEngineeringen_US
dc.identifier.pgnospp. 567-572en_US
dc.identifier.placeMoratuwa, Sri Lankaen_US
dc.identifier.proceedingProceedings of Moratuwa Engineering Research Conference 2021en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/19133
dc.identifier.year2021en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.urihttps://ieeexplore.ieee.org/document/9525786/en_US
dc.subjectsentence similarityen_US
dc.subjectsiamese neural networksen_US
dc.subjectlong short-term memory (LSTM)en_US
dc.subjectSinhalaen_US
dc.subjectTamilen_US
dc.subjectWord embeddingsen_US
dc.subjectFastTexten_US
dc.titleMonolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languagesen_US
dc.typeConference-Full-texten_US

Files

Collections