Monolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languages
dc.contributor.author | Nilaxan, S | |
dc.contributor.author | Ranathunga, S | |
dc.contributor.editor | Adhikariwatte, W | |
dc.contributor.editor | Rathnayake, M | |
dc.contributor.editor | Hemachandra, K | |
dc.date.accessioned | 2022-10-19T05:49:35Z | |
dc.date.available | 2022-10-19T05:49:35Z | |
dc.date.issued | 2021-07 | |
dc.description.abstract | Sentence similarity is useful in many Natural Language Processing tasks such as plagiarism checking and paraphrasing. So far, only conventional unsupervised sentence similarity measurement techniques (knowledge-based, corpus-based, string similarity-based, and hybrid) have been used to measure sentence similarity for Tamil and Sinhala languages. In this paper, we present a Deep Learning technique to measure sentence similarity for these two languages, which makes use of a Siamese Neural Network that consists of two Long Short-Term Memory (LSTM) networks, and neural word embeddings as the input representation. This approach achieved a 3.07% higher Pearson correlation coefficient for the dataset of 2500 Tamil sentence pairs, and a 3.61% higher Pearson correlation for the dataset of 5000 Sinhala sentence pairs over the conventional unsupervised sentence similarity measurement techniques. | en_US |
dc.identifier.citation | S. Nilaxan and S. Ranathunga, "Monolingual Sentence Similarity Measurement using Siamese Neural Networks for Sinhala and Tamil Languages," 2021 Moratuwa Engineering Research Conference (MERCon), 2021, pp. 567-572, doi: 10.1109/MERCon52712.2021.9525786. | en_US |
dc.identifier.conference | Moratuwa Engineering Research Conference 2021 | en_US |
dc.identifier.department | Engineering Research Unit, University of Moratuwa | en_US |
dc.identifier.doi | 10.1109/MERCon52712.2021.9525786 | en_US |
dc.identifier.faculty | Engineering | en_US |
dc.identifier.pgnos | pp. 567-572 | en_US |
dc.identifier.place | Moratuwa, Sri Lanka | en_US |
dc.identifier.proceeding | Proceedings of Moratuwa Engineering Research Conference 2021 | en_US |
dc.identifier.uri | http://dl.lib.uom.lk/handle/123/19133 | |
dc.identifier.year | 2021 | en_US |
dc.language.iso | en | en_US |
dc.publisher | IEEE | en_US |
dc.relation.uri | https://ieeexplore.ieee.org/document/9525786/ | en_US |
dc.subject | sentence similarity | en_US |
dc.subject | siamese neural networks | en_US |
dc.subject | long short-term memory (LSTM) | en_US |
dc.subject | Sinhala | en_US |
dc.subject | Tamil | en_US |
dc.subject | Word embeddings | en_US |
dc.subject | FastText | en_US |
dc.title | Monolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languages | en_US |
dc.type | Conference-Full-text | en_US |