Assessing the quality of information on Wikipedia articles using deep learning

Loading...
Thumbnail Image

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The creation of user-generated content has increased with the modern development of the Internet. Wikipedia stands out as the world's largest open-source digital encyclopedia, offering free access to extensive knowledge. As of March 2025, Wikipedia has over 6.9 million English articles. However, the collaborative aspect brings into question the accuracy and consistency of the information provided. English Wikipedia receives over 2 edits, over 4000 page views every second, with 500 new articles per day. Due to that, maintaining quality standards remains a major challenge. Though traditional encyclopedias rely on expert review, Wikipedia depends on a collaborative editing process, making quality control more complex. While a frosting wealth of information, skepticism persists among academics regarding its credibility as a reliable source. To address these concerns this research proposed a Wikipedia article content- based quality classification model using deep learning. A feed-forward neural network is combined with large language model-based embedding, text embedding Ada 002 for classification. The model is trained on 5810 English Wikipedia articles, which contain 2905 high-quality, featured articles (FA) and 2905 low-quality articles as A, GA, B, C, Start and Stub class randomly. The supervised embedding-based binary classification model achieved an impressive accuracy of 96.56% with precision, recall and F1 scores of 0.9448, 0.9885 and 0.9661 respectively proving the model’s effectiveness and robustness in assessing the content quality. The proposed model can help contributors in enhancing article quality and help readers identify reliable, trustworthy information while strengthening Wikipedia’s credibility. Future research should explore integrating metadata and extend the approach towards multilingual Wikipedia editions.

Description

Citation

Gunathilaka, P.D.S.T. (2025). Assessing the quality of information on Wikipedia articles using deep learning [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24586

DOI

Endorsement

Review

Supplemented By

Referenced By