An improved kNN algorithm using k-means and Fast text to predict sentiments expressed in Tamil texts

dc.contributor.authorThavareesan, S
dc.contributor.authorMahesan, S
dc.contributor.editorThayasivam, U
dc.contributor.editorRathnayaka, C
dc.date.accessioned2025-01-24T03:04:53Z
dc.date.available2025-01-24T03:04:53Z
dc.date.issued2020
dc.description.abstractWith the intention to develop a suitable approach to performing Sentiment Analysis on Tamil Texts using K-means clustering with k-Nearest Neighbour (k-NN) classifier, a corpus UJ_Corpus_Opinions consisting of 1518 Positive and 1173 Negative comments has been constructed. For training 820 positive and 820 negative comments are taken, and for testing 650 and 350 respectively. Bag of Words (BoW) and fastText vectors are used to create feature vectors. These feature vectors are clustered using K-means clustering. The cluster centroids are used as classification keys for k-NN classifier. Two types of clustering techniques are utilised to develop two models: (i) using class-wise information, (ii) with no class-wise information. These two models are tested using K-Fold. All these four models are tested with the two types of feature vectors. These models are tested using varying number of centroids (Kc:1..10), neighbours (Kn:1..Kc) and folds (Kf:1..10) to study their influence in the accuracy. The accuracy increases with the values of Kc, and the highest accuracy (74%) is obtained for Kn=1 and Kf=2. Accuracy, in general, is found to be more with fastText than with the BoW. The model with fastText and class-wise clustering with K-Fold that obtained 74% accuracy has F1-Score of 0.74.en_US
dc.identifier.citationThavareesan, S., & Mahesan, S., (2020). An improved kNN algorithm using k-means and Fast text to predict sentiments expressed in Tamil texts. In U. Thayasivam., & C. Rathnayaka, (Ed.), Symposium on Natural Language Processing 2020: Proceedings of Symposium on Natural Language Processing 2020 (p. 14). National Language Processing Centre University of Moratuwa. http://dl.lib.uom.lk/handle/123/23261
dc.identifier.conferenceSymposium on Natural Language Processing 2020en_US
dc.identifier.pgnosp,14en_US
dc.identifier.placeUniversity of Moratuwaen_US
dc.identifier.proceedingProceedings of Symposium on Natural Language Processing 2020en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/23261
dc.identifier.year2020en_US
dc.language.isoenen_US
dc.publisherNational Language Processing Centre University of Moratuwa Sri Lankaen_US
dc.subjectSentiment Analysisen_US
dc.subjectTamilen_US
dc.subjectK-meansen_US
dc.subjectk-Nearest Neighbouren_US
dc.subjectfastTexten_US
dc.titleAn improved kNN algorithm using k-means and Fast text to predict sentiments expressed in Tamil textsen_US
dc.typeConference-Abstracten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SNLP 2020-17.pdf
Size:
57.7 KB
Format:
Adobe Portable Document Format
Description:
SLNP 2020-17

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections