An improved kNN algorithm using k-means and Fast text to predict sentiments expressed in Tamil texts

Thavareesan, S; Mahesan, S

UoM IR
→
Research Publications
→
Conference Proceedings
→
UoM Conferences
→
Departmental Conferences
→
Symposium on Natural Language Processing
→
SNLP - 2020
→
View Item

dc.contributor.author	Thavareesan, S
dc.contributor.author	Mahesan, S
dc.contributor.editor	Thayasivam, U
dc.contributor.editor	Rathnayaka, C
dc.date.accessioned	2025-01-24T03:04:53Z
dc.date.available	2025-01-24T03:04:53Z
dc.date.issued	2020
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/23261
dc.description.abstract	With the intention to develop a suitable approach to performing Sentiment Analysis on Tamil Texts using K-means clustering with k-Nearest Neighbour (k-NN) classifier, a corpus UJ_Corpus_Opinions consisting of 1518 Positive and 1173 Negative comments has been constructed. For training 820 positive and 820 negative comments are taken, and for testing 650 and 350 respectively. Bag of Words (BoW) and fastText vectors are used to create feature vectors. These feature vectors are clustered using K-means clustering. The cluster centroids are used as classification keys for k-NN classifier. Two types of clustering techniques are utilised to develop two models: (i) using class-wise information, (ii) with no class-wise information. These two models are tested using K-Fold. All these four models are tested with the two types of feature vectors. These models are tested using varying number of centroids (Kc:1..10), neighbours (Kn:1..Kc) and folds (Kf:1..10) to study their influence in the accuracy. The accuracy increases with the values of Kc, and the highest accuracy (74%) is obtained for Kn=1 and Kf=2. Accuracy, in general, is found to be more with fastText than with the BoW. The model with fastText and class-wise clustering with K-Fold that obtained 74% accuracy has F1-Score of 0.74.	en_US
dc.language.iso	en	en_US
dc.publisher	National Language Processing Centre University of Moratuwa Sri Lanka	en_US
dc.subject	Sentiment Analysis	en_US
dc.subject	Tamil	en_US
dc.subject	K-means	en_US
dc.subject	k-Nearest Neighbour	en_US
dc.subject	fastText	en_US
dc.title	An improved kNN algorithm using k-means and Fast text to predict sentiments expressed in Tamil texts	en_US
dc.type	Conference-Abstract	en_US
dc.identifier.year	2020	en_US
dc.identifier.conference	Symposium on Natural Language Processing 2020	en_US
dc.identifier.place	University of Moratuwa	en_US
dc.identifier.pgnos	p,14	en_US
dc.identifier.proceeding	Proceedings of Symposium on Natural Language Processing 2020	en_US