Institutional-Repository, University of Moratuwa.  

An improved kNN algorithm using k-means and Fast text to predict sentiments expressed in Tamil texts

Show simple item record

dc.contributor.author Thavareesan, S
dc.contributor.author Mahesan, S
dc.contributor.editor Thayasivam, U
dc.contributor.editor Rathnayaka, C
dc.date.accessioned 2025-01-24T03:04:53Z
dc.date.available 2025-01-24T03:04:53Z
dc.date.issued 2020
dc.identifier.uri http://dl.lib.uom.lk/handle/123/23261
dc.description.abstract With the intention to develop a suitable approach to performing Sentiment Analysis on Tamil Texts using K-means clustering with k-Nearest Neighbour (k-NN) classifier, a corpus UJ_Corpus_Opinions consisting of 1518 Positive and 1173 Negative comments has been constructed. For training 820 positive and 820 negative comments are taken, and for testing 650 and 350 respectively. Bag of Words (BoW) and fastText vectors are used to create feature vectors. These feature vectors are clustered using K-means clustering. The cluster centroids are used as classification keys for k-NN classifier. Two types of clustering techniques are utilised to develop two models: (i) using class-wise information, (ii) with no class-wise information. These two models are tested using K-Fold. All these four models are tested with the two types of feature vectors. These models are tested using varying number of centroids (Kc:1..10), neighbours (Kn:1..Kc) and folds (Kf:1..10) to study their influence in the accuracy. The accuracy increases with the values of Kc, and the highest accuracy (74%) is obtained for Kn=1 and Kf=2. Accuracy, in general, is found to be more with fastText than with the BoW. The model with fastText and class-wise clustering with K-Fold that obtained 74% accuracy has F1-Score of 0.74. en_US
dc.language.iso en en_US
dc.publisher National Language Processing Centre University of Moratuwa Sri Lanka en_US
dc.subject Sentiment Analysis en_US
dc.subject Tamil en_US
dc.subject K-means en_US
dc.subject k-Nearest Neighbour en_US
dc.subject fastText en_US
dc.title An improved kNN algorithm using k-means and Fast text to predict sentiments expressed in Tamil texts en_US
dc.type Conference-Abstract en_US
dc.identifier.year 2020 en_US
dc.identifier.conference Symposium on Natural Language Processing 2020 en_US
dc.identifier.place University of Moratuwa en_US
dc.identifier.pgnos p,14 en_US
dc.identifier.proceeding Proceedings of Symposium on Natural Language Processing 2020 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record