A deep learning ensemble hate speech detection approach for Sinhala tweets

Munasinghe MISA

A deep learning ensemble hate speech detection approach for Sinhala tweets

Files

TH4942-1.pdf (120.6 KB)

TH4942-2.pdf (80.43 KB)

TH4942.pdf (991.41 KB)

Date

2022

Authors

Munasinghe MISA

Abstract

We live in an era where social media platforms play a key role in the society. With the advancement of technology, these platforms have become more closer to people and currently, they can interact with most of the native languages including the Sinhala language. This has enabled people to express their opinions more conveniently. At the same time, it is very common to observe that people express very hateful offensive opinions on social media platforms and in certain applications it a mandatory to block this kind of content. Several studies have been carried out on this area for the Sinhala language with traditional machine learning models and as per the results, none of them have shown promising results. Further, current approaches are far behind the latest techniques carried out in high-resource languages like English. Hence this study presents a deep learning-based approach for hate speech detection which has shown outstanding results for other languages. Three deep learning models namely LSTM, CNN and BiGRU which have proven performance in Natural Language Processing domain have been considered here. Moreover, a deep learning ensemble was constructed from these three models to evaluate whether the ensemble technique can further improve the model performance. These models were trained and tested on a newly created dataset using the Twitter API. Moreover, the model generalizability was further tested by applying it to a completely new dataset. As per the results, it can be clearly observed that the deep learning-based approach has outperformed the traditional machine learning models. Moreover, further tests on the model generalizability reveal that this approach is more generalized and produces better predictions than the prior approaches. Finally, this study experiments with using extra features in addition to the Tweet content such as retweet count, favourited count, etc, to evaluate whether those can be utilized to improve the performance further. As per the results obtained in this study, it can be observed that there is an impact on the performance using extra features. It is recommended to experiment further on this area in future studies.

Keywords

DEEP LEARNING, SPEECH DETECTION, SINHALA TWEETS, INFORMATION TECHNOLOGY -Dissertation, COMPUTER SCIENCE -Dissertation, COMPUTER SCIENCE & ENGINEERING -Dissertation

Citation

Munasinghe, M.I.S.A. (2022). A deep learning ensemble hate speech detection approach for Sinhala tweets [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21857

URI

http://dl.lib.uom.lk/handle/123/21857

Collections

Master of Science in Computer science and Engineering

Full item page

A deep learning ensemble hate speech detection approach for Sinhala tweets

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By