Monolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languages
Loading...
Date
2021-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Sentence similarity is useful in many Natural Language Processing tasks such as plagiarism checking and paraphrasing. So far, only conventional unsupervised sentence similarity measurement techniques (knowledge-based, corpus-based, string similarity-based, and hybrid) have been used to measure sentence similarity for Tamil and Sinhala languages. In this paper, we present a Deep Learning technique to measure sentence similarity for these two languages, which makes use of a Siamese Neural Network that consists of two Long Short-Term Memory (LSTM) networks, and neural word embeddings as the input representation. This approach achieved a 3.07% higher Pearson correlation coefficient for the dataset of 2500 Tamil sentence pairs, and a 3.61% higher Pearson correlation for the dataset of 5000 Sinhala sentence pairs over the conventional unsupervised sentence similarity measurement techniques.
Description
Citation
S. Nilaxan and S. Ranathunga, "Monolingual Sentence Similarity Measurement using Siamese Neural Networks for Sinhala and Tamil Languages," 2021 Moratuwa Engineering Research Conference (MERCon), 2021, pp. 567-572, doi: 10.1109/MERCon52712.2021.9525786.