A deep learning ensemble hate speech detection approach for Sinhala tweets

Munasinghe MISA

UoM IR
→
Thesis & Dissertation
→
Faculty of Engineering, Computer Science & Engineering
→
Master of Science in Computer science and Engineering
→
View Item

dc.contributor.advisor	Thayasivam U
dc.contributor.author	Munasinghe MISA
dc.date.accessioned	2022
dc.date.available	2022
dc.date.issued	2022
dc.identifier.citation	Munasinghe, M.I.S.A. (2022). A deep learning ensemble hate speech detection approach for Sinhala tweets [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21857
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/21857
dc.description.abstract	We live in an era where social media platforms play a key role in the society. With the advancement of technology, these platforms have become more closer to people and currently, they can interact with most of the native languages including the Sinhala language. This has enabled people to express their opinions more conveniently. At the same time, it is very common to observe that people express very hateful offensive opinions on social media platforms and in certain applications it a mandatory to block this kind of content. Several studies have been carried out on this area for the Sinhala language with traditional machine learning models and as per the results, none of them have shown promising results. Further, current approaches are far behind the latest techniques carried out in high-resource languages like English. Hence this study presents a deep learning-based approach for hate speech detection which has shown outstanding results for other languages. Three deep learning models namely LSTM, CNN and BiGRU which have proven performance in Natural Language Processing domain have been considered here. Moreover, a deep learning ensemble was constructed from these three models to evaluate whether the ensemble technique can further improve the model performance. These models were trained and tested on a newly created dataset using the Twitter API. Moreover, the model generalizability was further tested by applying it to a completely new dataset. As per the results, it can be clearly observed that the deep learning-based approach has outperformed the traditional machine learning models. Moreover, further tests on the model generalizability reveal that this approach is more generalized and produces better predictions than the prior approaches. Finally, this study experiments with using extra features in addition to the Tweet content such as retweet count, favourited count, etc, to evaluate whether those can be utilized to improve the performance further. As per the results obtained in this study, it can be observed that there is an impact on the performance using extra features. It is recommended to experiment further on this area in future studies.	en_US
dc.language.iso	en	en_US
dc.subject	DEEP LEARNING	en_US
dc.subject	SPEECH DETECTION	en_US
dc.subject	SINHALA TWEETS	en_US
dc.subject	INFORMATION TECHNOLOGY -Dissertation	en_US
dc.subject	COMPUTER SCIENCE -Dissertation	en_US
dc.subject	COMPUTER SCIENCE & ENGINEERING -Dissertation	en_US
dc.title	A deep learning ensemble hate speech detection approach for Sinhala tweets	en_US
dc.type	Thesis-Abstract	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.degree	MSc In Computer Science and Engineering	en_US
dc.identifier.department	Department of Computer Science and Engineering	en_US
dc.date.accept	2022
dc.identifier.accno	TH4942	en_US