Assessing the quality of information on Wikipedia articles using deep learning

dc.contributor.advisorAhangama, S
dc.contributor.advisorAhangama, S
dc.contributor.authorGunathilaka, PDST
dc.date.accept2025
dc.date.accessioned2025-12-11T10:26:00Z
dc.date.issued2025
dc.description.abstractThe creation of user-generated content has increased with the modern development of the Internet. Wikipedia stands out as the world's largest open-source digital encyclopedia, offering free access to extensive knowledge. As of March 2025, Wikipedia has over 6.9 million English articles. However, the collaborative aspect brings into question the accuracy and consistency of the information provided. English Wikipedia receives over 2 edits, over 4000 page views every second, with 500 new articles per day. Due to that, maintaining quality standards remains a major challenge. Though traditional encyclopedias rely on expert review, Wikipedia depends on a collaborative editing process, making quality control more complex. While a frosting wealth of information, skepticism persists among academics regarding its credibility as a reliable source. To address these concerns this research proposed a Wikipedia article content- based quality classification model using deep learning. A feed-forward neural network is combined with large language model-based embedding, text embedding Ada 002 for classification. The model is trained on 5810 English Wikipedia articles, which contain 2905 high-quality, featured articles (FA) and 2905 low-quality articles as A, GA, B, C, Start and Stub class randomly. The supervised embedding-based binary classification model achieved an impressive accuracy of 96.56% with precision, recall and F1 scores of 0.9448, 0.9885 and 0.9661 respectively proving the model’s effectiveness and robustness in assessing the content quality. The proposed model can help contributors in enhancing article quality and help readers identify reliable, trustworthy information while strengthening Wikipedia’s credibility. Future research should explore integrating metadata and extend the approach towards multilingual Wikipedia editions.
dc.identifier.accnoTH5963
dc.identifier.citationGunathilaka, P.D.S.T. (2025). Assessing the quality of information on Wikipedia articles using deep learning [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24586
dc.identifier.degreeMSc in Information Technology
dc.identifier.departmentDepartment of Information Technology
dc.identifier.facultyIT
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/24586
dc.language.isoen
dc.subjectWIKIPEDIA
dc.subjectWIKIPEDIA-Contents-Quality Control
dc.subjectNEURAL NETWORKS
dc.subjectFEED FORWARD
dc.subjectDEEP LEARNING
dc.subjectLARGE LANGUAGE MODELS
dc.subjectGPT 3.5
dc.subjectINFORMATION TECHNOLOGY-Dissertation
dc.subjectMSc in Information Technology
dc.titleAssessing the quality of information on Wikipedia articles using deep learning
dc.typeThesis-Abstract

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5963-1.pdf
Size:
989.29 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH5963-2.pdf
Size:
213.08 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH5963.pdf
Size:
4.01 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: