Abstract:
The era of social media, such as YouTube, Facebook, and Twitter adding comments to
posts are being fun in the daily life of people. But this is also used to spread hate speech
and organize hate based activities increasingly nowadays. Harmful and offensive text
identification on social media platforms is being a trending research area over the last few
years. In a country like Sri Lanka with multiple native languages, people like to
comment on social media mostly in their native language. Tamil is one of the Languages
commonly used and spoken in the North and East part of Sri Lanka. In recent years
people like to comment not only in their native language they also comment in more than
one language. In Sri Lanka, people use Singlish (Sinhala + English ) or Tanglish (Tamil +
English).
Because of the rapid growth of hateful content on social media, there is an immediate
need for an efficient and effective method to identify harmful content. A huge number of
researches have been done and are being done for automated harmful content detection
online. The complication of the Natural Language constructs builds this task very
challenging.
A maximum of the research are done in the English Language. This research work aims
to classify the code-mixed Tamil comments on social media by categorizing them as
harmful and non-harmful by using machine learning models.
Citation:
SivalIngam, D. (2022). Identifying harmful comments for Tamil language on social media [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/20325