Master of Science by research

Permanent URI for this collectionhttp://192.248.9.226/handle/123/22435

Browse

Recent Submissions

Now showing 1 - 3 of 3
  • item: Thesis-Abstract
    Early detection of Sinhala language fake news in social media networks
    (2024) Hathnapitiya, H.G.H.S; Ahangama S; Adikari S
    With human evolution, people invented new technologies to make life easier. In the early twentieth century, people read newspapers, listened to radio, and watched television to gather information. With the refinement of technologies, tech people introduced social media platforms to connect with people. Busy modern people started to browse and rely on these platforms to gather news while losing interest in traditional platforms. Social media is easy to access and cost-effective. These platforms can be effortlessly used for propagating fake news content and misleading people for personal, political, or religious benefits. Society must have a proper mechanism to avoid the spread of false information. The knowledge of human experts can be used to overcome the issue by manually investigating news content. However, it requires many human experts, and it consumes time. The study introduced an automated system to detect Sinhala fake news published on social media when the content is published. The data set was created by gathering news from Facebook, which was proven fake by Sri Lankan fact-checkers or legitimate by Sri Lankan news broadcasting channels. The proposed method considered content-related features with deep learning and machine learning techniques. The deep learning model was implemented by extracting Sinhala POS tags and their TF-IDF values combined with XLM-R embeddings. The introduced deep learning approach achieved 86% accuracy. The machine learning approach used TF-IDF values of Sinhala POS tags, FastText embeddings, and punctuation count. The proposed machine learning approach achieved 85% accuracy. The proposed methods can identify fake news early, preventing its spread. The performance can be further enhanced by increasing the dataset size by collecting more data. Keywords – Sinhala fake news, social media, content-related features, natural language processing (NLP), deep learning (DL), machine learning (ML)
  • item: Thesis-Abstract
    Analysing information quality of Wikipedia articles
    (2022) Sirisoma WCS; Ahangama Supunmali; Ahangama Sapumal
    User Generated Content (UGC) is growing in significance for information sharing along with the introduction of Web 2.0. Being one of the largest UGC databases in the world, Wikipedia also stands as the largest community-based collaborative encyclopedia ever created. However, Wikipedia's open-source and collaborative structure presents a serious information quality (IQ) concern. Malicious users take advantage of Wikipedia's popularity on the World Wide Web (WWW) when conducting malicious activities such as link spamming. Wikipedia is therefore often discouraged for use in academic-related activities and research. However. there are some high-quality articles that are both rich in information and quality. Statistical models and machine learning algorithms have been used in existing methods for determining Wikipedia's IQ. However, the outcomes of these models are not satisfactory. Therefore, in this study a novel theoretical model for evaluating IQ is presented, based on Google's E-A-T framework. The model comprises three IQ constructs Expertise, Authority and Trustworthiness. A collection of IQ dimensions that affect the aforementioned three IQ constructs as well as 45 IQ attributes to assess the IQ dimensions were identified and presented based on empirical findings and study results. A Selenium 3.14 web automation script was used to automatically and inexpensively extract the IQ attributes from Wikipedia articles' content and metadata statistics. The data study employed a sample of 2000 articles from six WikiProjects, including 1000 Featured Articles (FA) and 1000 non-FA articles. The suggested model's classification and clustering accuracies were compared to those of three previously published models. The proposed model was compared with three previously published models in terms of classification and clustering accuracy. It received classification and clustering accuracies of 95% and 93% respectively, which is a drastic improvement over the existing models. Furthermore, an average inter-rater agreement of 84% was observed. Accordingly, this comprehensive experiment fairly validates the effectiveness of the suggested model. This study contributes to the related knowledge area by introducing a novel framework to assess Wikipedia articles’ IQ.
  • item: Thesis-Abstract
    Named entity boundary detection for Sinhala
    (2022) Priyadarshana YHPP; Ranathunga L
    Named entity recognition (NER) can be introduced as a sequential categorizing task which contains a potential gravity in novel research arena. NER can be mentioned as the foundation for accomplishing most common natural language processing (NLP) tasks such as information extraction, information retrieval, semantic role labelling etc. Even though plenty of attempts have been employed on NE type detection, still there are plenty of avenues to be discovered under the NE boundary detection. Analyzing Sinhala related contents which have been published in social media can also be considered as one of the rising factors due to several human involvements in the recent past. The ultimate goal which is to obtain a constructive deep neural framework for determining named entity boundary detection has been achieved in a comprehensive manner and the model has been tested using Sinhala related statements which have been extracted through social media. Several objectives have been determined to accomplish this task considering the existing baselines. Several novelties have been identified to show off the uniqueness of this approach. Specifically, the novel concept “Boundary Bubbles” has been used to identify the specific entity mentions considering each head word for the identified named entities. Various experiments have been conducted based on multiple evaluation criteria and the named entity boundary detection model performs well with an average of 71% in Precision, 67% in Recall and 63% in F1 over the existing benchmarks. Hence this novel framework can be accepted as a vital solution for determining named entity boundary detection under forecasting various computational activities in social media.