Monolingual sentence similarity measurement using siamese neural networks for Sinhala and Tamil languages

Satkunanantham N

Monolingual sentence similarity measurement using siamese neural networks for Sinhala and Tamil languages

Files

TH4661-1.pdf (213.2 KB)

TH4661-2.pdf (143.77 KB)

TH4661.pdf (2.39 MB)

Date

2021

Authors

Satkunanantham N

Abstract

Sentence similarity plays a key role in text-processing related research such as plagiarism checking and paraphrasing. So far, only conventional unsupervised sentence similarity techniques such as string-based, corpus-based, knowledge-based, and hybrid approaches have been used to measure sentence similarity for Tamil and Sinhala languages. In this research, we introduce a Deep Learning methodology to measure sentence similarity for these two languages, which makes use of Siamese Recurrent Neural Networks techniques together with a word-embedding model as the input representation. This approach achieved a 3.07% higher Pearson correlation coefficient for the Tamil dataset of 2500 sentence pairs and a 3.61% higher Pearson correlation coefficient for the Sinhala dataset of 5000 sentence pairs. Both these results outperform that of the conventional unsupervised sentence similarity techniques applied on the same datasets.

Keywords

Citation

Satkunanantham, N. (2021). Monolingual sentence similarity measurement using siamese neural networks for Sinhala and Tamil languages [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/20465

URI

http://dl.lib.uom.lk/handle/123/20465

Collections

Master of Science in Computer science and Engineering

Full item page

Monolingual sentence similarity measurement using siamese neural networks for Sinhala and Tamil languages

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By