Monolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languages

Nilaxan, S; Ranathunga, S

UoM IR
→
Research Publications
→
Conference Proceedings
→
UoM Conferences
→
Faculty of Engineering Research Unit (ERU & MERCon)
→
MERCon - 2021
→
View Item

dc.contributor.author	Nilaxan, S
dc.contributor.author	Ranathunga, S
dc.contributor.editor	Adhikariwatte, W
dc.contributor.editor	Rathnayake, M
dc.contributor.editor	Hemachandra, K
dc.date.accessioned	2022-10-19T05:49:35Z
dc.date.available	2022-10-19T05:49:35Z
dc.date.issued	2021-07
dc.identifier.citation	S. Nilaxan and S. Ranathunga, "Monolingual Sentence Similarity Measurement using Siamese Neural Networks for Sinhala and Tamil Languages," 2021 Moratuwa Engineering Research Conference (MERCon), 2021, pp. 567-572, doi: 10.1109/MERCon52712.2021.9525786.	en_US
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/19133
dc.description.abstract	Sentence similarity is useful in many Natural Language Processing tasks such as plagiarism checking and paraphrasing. So far, only conventional unsupervised sentence similarity measurement techniques (knowledge-based, corpus-based, string similarity-based, and hybrid) have been used to measure sentence similarity for Tamil and Sinhala languages. In this paper, we present a Deep Learning technique to measure sentence similarity for these two languages, which makes use of a Siamese Neural Network that consists of two Long Short-Term Memory (LSTM) networks, and neural word embeddings as the input representation. This approach achieved a 3.07% higher Pearson correlation coefficient for the dataset of 2500 Tamil sentence pairs, and a 3.61% higher Pearson correlation for the dataset of 5000 Sinhala sentence pairs over the conventional unsupervised sentence similarity measurement techniques.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.relation.uri	https://ieeexplore.ieee.org/document/9525786/	en_US
dc.subject	sentence similarity	en_US
dc.subject	siamese neural networks	en_US
dc.subject	long short-term memory (LSTM)	en_US
dc.subject	Sinhala	en_US
dc.subject	Tamil	en_US
dc.subject	Word embeddings	en_US
dc.subject	FastText	en_US
dc.title	Monolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languages	en_US
dc.type	Conference-Full-text	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.department	Engineering Research Unit, University of Moratuwa	en_US
dc.identifier.year	2021	en_US
dc.identifier.conference	Moratuwa Engineering Research Conference 2021	en_US
dc.identifier.place	Moratuwa, Sri Lanka	en_US
dc.identifier.pgnos	pp. 567-572	en_US
dc.identifier.proceeding	Proceedings of Moratuwa Engineering Research Conference 2021	en_US
dc.identifier.doi	10.1109/MERCon52712.2021.9525786	en_US