A deep learning ensemble hate speech detection approach for Sinhala tweets

dc.contributor.advisorThayasivam U
dc.contributor.authorMunasinghe MISA
dc.date.accept2022
dc.date.accessioned2022
dc.date.available2022
dc.date.issued2022
dc.description.abstractWe live in an era where social media platforms play a key role in the society. With the advancement of technology, these platforms have become more closer to people and currently, they can interact with most of the native languages including the Sinhala language. This has enabled people to express their opinions more conveniently. At the same time, it is very common to observe that people express very hateful offensive opinions on social media platforms and in certain applications it a mandatory to block this kind of content. Several studies have been carried out on this area for the Sinhala language with traditional machine learning models and as per the results, none of them have shown promising results. Further, current approaches are far behind the latest techniques carried out in high-resource languages like English. Hence this study presents a deep learning-based approach for hate speech detection which has shown outstanding results for other languages. Three deep learning models namely LSTM, CNN and BiGRU which have proven performance in Natural Language Processing domain have been considered here. Moreover, a deep learning ensemble was constructed from these three models to evaluate whether the ensemble technique can further improve the model performance. These models were trained and tested on a newly created dataset using the Twitter API. Moreover, the model generalizability was further tested by applying it to a completely new dataset. As per the results, it can be clearly observed that the deep learning-based approach has outperformed the traditional machine learning models. Moreover, further tests on the model generalizability reveal that this approach is more generalized and produces better predictions than the prior approaches. Finally, this study experiments with using extra features in addition to the Tweet content such as retweet count, favourited count, etc, to evaluate whether those can be utilized to improve the performance further. As per the results obtained in this study, it can be observed that there is an impact on the performance using extra features. It is recommended to experiment further on this area in future studies.en_US
dc.identifier.accnoTH4942en_US
dc.identifier.citationMunasinghe, M.I.S.A. (2022). A deep learning ensemble hate speech detection approach for Sinhala tweets [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21857
dc.identifier.degreeMSc In Computer Science and Engineeringen_US
dc.identifier.departmentDepartment of Computer Science and Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/21857
dc.language.isoenen_US
dc.subjectDEEP LEARNINGen_US
dc.subjectSPEECH DETECTIONen_US
dc.subjectSINHALA TWEETSen_US
dc.subjectINFORMATION TECHNOLOGY -Dissertationen_US
dc.subjectCOMPUTER SCIENCE -Dissertationen_US
dc.subjectCOMPUTER SCIENCE & ENGINEERING -Dissertationen_US
dc.titleA deep learning ensemble hate speech detection approach for Sinhala tweetsen_US
dc.typeThesis-Abstracten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH4942-1.pdf
Size:
120.6 KB
Format:
Adobe Portable Document Format
Description:
Pre-Text
Loading...
Thumbnail Image
Name:
TH4942-2.pdf
Size:
80.43 KB
Format:
Adobe Portable Document Format
Description:
Post-Text
Loading...
Thumbnail Image
Name:
TH4942.pdf
Size:
991.41 KB
Format:
Adobe Portable Document Format
Description:
Full-theses