Institutional-Repository, University of Moratuwa.  

Sentiment lexicon expansion using word2vec and fasttext for sentiment prediction in Tamil

Show simple item record

dc.contributor.author Thavareesan, S
dc.contributor.author Mahesan, S
dc.contributor.editor Weeraddana, C
dc.contributor.editor Edussooriya, CUS
dc.contributor.editor Abeysooriya, RP
dc.date.accessioned 2022-08-09T09:28:06Z
dc.date.available 2022-08-09T09:28:06Z
dc.date.issued 2020-07
dc.identifier.citation S. Thavareesan and S. Mahesan, "Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts," 2020 Moratuwa Engineering Research Conference (MERCon), 2020, pp. 272-276, doi: 10.1109/MERCon50084.2020.9185369. en_US
dc.identifier.uri http://dl.lib.uom.lk/handle/123/18581
dc.description.abstract Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are huge number of sentiment terms that are not in the SentiWordNet limit the performance of Sentiment Analysis. Gathering and grouping such sentiment words manually is a tedious task. In this paper we propose a sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method. We expand the sentiment lexicon from the initial seed list of 2951 positive and 5598 negative words in two steps: (i) Gathering related words using Word2vec word embedding and (ii) Gathering lexically similar words using fastText word embedding. Our final lexicons UJ_Lex_Pos and UJ_Lex_Neg ended up with 10537 positive and 12664 negative words respectively which are labelled using Word2vec word embedding. Furthermore the rule-based Sentiment Analysis method uses expanded lexicons (UJ_Lex_Pos and UJ_Lex_Neg), lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts. The method is evaluated on UJ_MovieReviews and an accuracy of 88 0.14% is obtained. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.uri https://ieeexplore.ieee.org/document/9185369 en_US
dc.subject Sentiment analysis en_US
dc.subject Tamil en_US
dc.subject lexicon en_US
dc.subject conjunction en_US
dc.subject grammar rule en_US
dc.title Sentiment lexicon expansion using word2vec and fasttext for sentiment prediction in Tamil en_US
dc.type Conference-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.department Engineering Research Unit, University of Moratuwa en_US
dc.identifier.year 2020 en_US
dc.identifier.conference Moratuwa Engineering Research Conference 2020 en_US
dc.identifier.place Moratuwa, Sri Lanka en_US
dc.identifier.pgnos pp. 272-276 en_US
dc.identifier.proceeding Proceedings of Moratuwa Engineering Research Conference 2020 en_US
dc.identifier.email sajeethas@esn.ac.lk en_US
dc.identifier.email mahesans@univ.jfn.ac.lk en_US
dc.identifier.doi 10.1109/MERCon50084.2020.9185369 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record