Abstract:
Social media platforms have emerged rapidly with technological advancements. Facebook, the most widely used social media platform has been the primary reason for the spread of hatred in Sri Lanka in the recent past. When a post with Sinhala hate content is reported on Facebook, it is translated to the English language before the review of the moderators. In most instances, the translated content has a different context compared to the original post. This results in concluding that the reported post does not violate the established policies and guidelines concerning hate content. Hence, an effective approach needs to be in place to address the aforementioned problem. This research project proposes a solution through an automated tool that is capable of detecting hate content presented in Sinhala phrases extracted from Facebook posts/memes. The tool accepts an image that contains Sinhala texts, extracts the text using a Convolutional Neural Network (CNN) model, preprocesses the text using Natural Language Processing (NLP) techniques, analyzes the preprocessed text to identify hate intensity level and finally classifies the text into four main domains named Political, Race, Religion and Gender using a text classification model.
Citation:
E. Silva, M. Nandathilaka, S. Dalugoda, T. Amarasinghe, S. Ahangama and G. T. Weerasuriya, "Machine Learning-Based Automated Tool to Detect Sinhala Hate Speech in Images," 2021 6th International Conference on Information Technology Research (ICITR), 2021, pp. 1-7, doi: 10.1109/ICITR54349.2021.9657453.