Institutional-Repository, University of Moratuwa.  

A deep learning ensemble hate speech detection approach for Sinhala tweets

Show simple item record

dc.contributor.advisor Thayasivam U
dc.contributor.author Munasinghe MISA
dc.date.accessioned 2022
dc.date.available 2022
dc.date.issued 2022
dc.identifier.citation Munasinghe, M.I.S.A. (2022). A deep learning ensemble hate speech detection approach for Sinhala tweets [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21857
dc.identifier.uri http://dl.lib.uom.lk/handle/123/21857
dc.description.abstract We live in an era where social media platforms play a key role in the society. With the advancement of technology, these platforms have become more closer to people and currently, they can interact with most of the native languages including the Sinhala language. This has enabled people to express their opinions more conveniently. At the same time, it is very common to observe that people express very hateful offensive opinions on social media platforms and in certain applications it a mandatory to block this kind of content. Several studies have been carried out on this area for the Sinhala language with traditional machine learning models and as per the results, none of them have shown promising results. Further, current approaches are far behind the latest techniques carried out in high-resource languages like English. Hence this study presents a deep learning-based approach for hate speech detection which has shown outstanding results for other languages. Three deep learning models namely LSTM, CNN and BiGRU which have proven performance in Natural Language Processing domain have been considered here. Moreover, a deep learning ensemble was constructed from these three models to evaluate whether the ensemble technique can further improve the model performance. These models were trained and tested on a newly created dataset using the Twitter API. Moreover, the model generalizability was further tested by applying it to a completely new dataset. As per the results, it can be clearly observed that the deep learning-based approach has outperformed the traditional machine learning models. Moreover, further tests on the model generalizability reveal that this approach is more generalized and produces better predictions than the prior approaches. Finally, this study experiments with using extra features in addition to the Tweet content such as retweet count, favourited count, etc, to evaluate whether those can be utilized to improve the performance further. As per the results obtained in this study, it can be observed that there is an impact on the performance using extra features. It is recommended to experiment further on this area in future studies. en_US
dc.language.iso en en_US
dc.subject DEEP LEARNING en_US
dc.subject SPEECH DETECTION en_US
dc.subject SINHALA TWEETS en_US
dc.subject INFORMATION TECHNOLOGY -Dissertation en_US
dc.subject COMPUTER SCIENCE -Dissertation en_US
dc.subject COMPUTER SCIENCE & ENGINEERING -Dissertation en_US
dc.title A deep learning ensemble hate speech detection approach for Sinhala tweets en_US
dc.type Thesis-Abstract en_US
dc.identifier.faculty Engineering en_US
dc.identifier.degree MSc In Computer Science and Engineering en_US
dc.identifier.department Department of Computer Science and Engineering en_US
dc.date.accept 2022
dc.identifier.accno TH4942 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record