Monolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languages

Nilaxan, S; Ranathunga, S

Monolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languages

Date

2021-07

Authors

Nilaxan, S

Ranathunga, S

Publisher

IEEE

Abstract

Sentence similarity is useful in many Natural Language Processing tasks such as plagiarism checking and paraphrasing. So far, only conventional unsupervised sentence similarity measurement techniques (knowledge-based, corpus-based, string similarity-based, and hybrid) have been used to measure sentence similarity for Tamil and Sinhala languages. In this paper, we present a Deep Learning technique to measure sentence similarity for these two languages, which makes use of a Siamese Neural Network that consists of two Long Short-Term Memory (LSTM) networks, and neural word embeddings as the input representation. This approach achieved a 3.07% higher Pearson correlation coefficient for the dataset of 2500 Tamil sentence pairs, and a 3.61% higher Pearson correlation for the dataset of 5000 Sinhala sentence pairs over the conventional unsupervised sentence similarity measurement techniques.

Keywords

Citation

S. Nilaxan and S. Ranathunga, "Monolingual Sentence Similarity Measurement using Siamese Neural Networks for Sinhala and Tamil Languages," 2021 Moratuwa Engineering Research Conference (MERCon), 2021, pp. 567-572, doi: 10.1109/MERCon52712.2021.9525786.

URI

http://dl.lib.uom.lk/handle/123/19133

DOI

10.1109/MERCon52712.2021.9525786

Collections

MERCon - 2021

Full item page

Monolingual sentence similarity measurement using siamese neural networks for sinhala and tamil languages

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By