Abstract:
Sentiment Analysis is the process of identifying
and categorising the sentiments expressed in a
text into positive or negative. The words which carry
the sentiments are the keys in sentiment prediction.
The SentiWordNet is the sentiment lexicon used to determine
the sentiment of texts. There are huge number of
sentiment terms that are not in the SentiWordNet limit
the performance of Sentiment Analysis. Gathering and
grouping such sentiment words manually is a tedious
task. In this paper we propose a sentiment lexicon
expansion method using Word2vec and fastText word
embeddings along with rule-based Sentiment Analysis
method. We expand the sentiment lexicon from the
initial seed list of 2951 positive and 5598 negative
words in two steps: (i) Gathering related words using
Word2vec word embedding and (ii) Gathering lexically
similar words using fastText word embedding. Our final
lexicons UJ_Lex_Pos and UJ_Lex_Neg ended up with 10537
positive and 12664 negative words respectively which
are labelled using Word2vec word embedding. Furthermore
the rule-based Sentiment Analysis method uses
expanded lexicons (UJ_Lex_Pos and UJ_Lex_Neg), lists of
conjunctions and negational words to predict the sentiments
expressed in Tamil texts. The method is evaluated
on UJ_MovieReviews and an accuracy of 88 0.14%
is obtained.
Citation:
S. Thavareesan and S. Mahesan, "Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts," 2020 Moratuwa Engineering Research Conference (MERCon), 2020, pp. 272-276, doi: 10.1109/MERCon50084.2020.9185369.