Aspects identification and sentiment analysis for code-mixed Sinhala-English social media comments in the telecommunication domain

dc.contributor.advisorRanathunga S
dc.contributor.authorChathuranga NAHWS
dc.date.accept2021
dc.date.accessioned2021
dc.date.available2021
dc.date.issued2021
dc.description.abstractIn the modern context of the business world, the customer experience department is vital in any kind of business. The profit of the company highly depends on the customer experience optimization strategies followed by the company. Therefore,implementing the best customer experience optimization strategies for the company is vital. Identifying the customer problems in real-time will help to improve the customerexperience towards the brand. Social media is the best way to identify customer issues since people tend to express their feelings towards the company in social media ascomments. Sentiment analysis and aspect predictions are done in this research toclassify customer comments into different areas and to identify the sentiment of the comment. Research is done on the telecommunication domain since there is no suchstudy done to the telecommunication domain previously and there is a high volume of data available in the social media compared to other domains. In the Sri Lankan context, most of the social media comments are based on the Singlish language. Singlish is the most commonly used method when writing comments on social media. Lack of Singlish language resources has brought challenges from gathering andgenerating data sets to stemming, lemmatizing, and stop word removal. This research overcomes the above challenges by developing a Singlish dataset for training the twomodels and developing word embeddings for the Singlish language. Word2vec and FastText word embeddings are trained using Singlish comments for the baseline modeland identified the best word embedding model with the embedding size. Sentiment and aspect prediction models have trained afterward with the best word embedding model.Logistic regression, random forest, Naive Bayes, and SVM models are trained underthe basic models.The deep learning-based models such as GRU, LSTM, and CNNbased models were trained. All state-of-the-art models are outperformed by theproposed approach, which is based on capsule networks and the BI Directional GRU model. The accuracy, as well as weighted precision and recall, and weighted F1 scores, are used to determine which model is the most effective.en_US
dc.identifier.accnoTH4582en_US
dc.identifier.citationChathuranga, N.A.H.W.S. (2021). Aspects identification and sentiment analysis for code-mixed Sinhala-English social media comments in the telecommunication domain [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21196
dc.identifier.degreeMSc In Computer Science and Engineeringen_US
dc.identifier.departmentDepartment of Computer Science and Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/21196
dc.language.isoenen_US
dc.subjectSENTIMENT ANALYSISen_US
dc.subjectCAPSULE NETWORKen_US
dc.subjectBI DIRECTIONAL GRUen_US
dc.subjectCOMPUTER SCIENCE & ENGINEERING -Dissertationen_US
dc.subjectCOMPUTER SCIENCE -Dissertationen_US
dc.subjectINFORMATION TECHNOLOGY -Dissertationen_US
dc.titleAspects identification and sentiment analysis for code-mixed Sinhala-English social media comments in the telecommunication domainen_US
dc.typeThesis-Abstracten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH4582-1.pdf
Size:
103.03 KB
Format:
Adobe Portable Document Format
Description:
Pre-Text
Loading...
Thumbnail Image
Name:
TH4582-2.pdf
Size:
150.79 KB
Format:
Adobe Portable Document Format
Description:
Post-Text
Loading...
Thumbnail Image
Name:
TH4582.pdf
Size:
1.11 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis