Abstract:
User Generated Content (UGC) is growing in significance for information sharing
along with the introduction of Web 2.0. Being one of the largest UGC databases in the
world, Wikipedia also stands as the largest community-based collaborative
encyclopedia ever created. However, Wikipedia's open-source and collaborative
structure presents a serious information quality (IQ) concern. Malicious users take
advantage of Wikipedia's popularity on the World Wide Web (WWW) when
conducting malicious activities such as link spamming. Wikipedia is therefore often
discouraged for use in academic-related activities and research. However. there are
some high-quality articles that are both rich in information and quality. Statistical
models and machine learning algorithms have been used in existing methods for
determining Wikipedia's IQ. However, the outcomes of these models are not
satisfactory. Therefore, in this study a novel theoretical model for evaluating IQ is
presented, based on Google's E-A-T framework. The model comprises three IQ
constructs Expertise, Authority and Trustworthiness. A collection of IQ dimensions that
affect the aforementioned three IQ constructs as well as 45 IQ attributes to assess the
IQ dimensions were identified and presented based on empirical findings and study
results. A Selenium 3.14 web automation script was used to automatically and
inexpensively extract the IQ attributes from Wikipedia articles' content and metadata
statistics. The data study employed a sample of 2000 articles from six WikiProjects,
including 1000 Featured Articles (FA) and 1000 non-FA articles. The suggested
model's classification and clustering accuracies were compared to those of three
previously published models. The proposed model was compared with three previously
published models in terms of classification and clustering accuracy. It received
classification and clustering accuracies of 95% and 93% respectively, which is a drastic
improvement over the existing models. Furthermore, an average inter-rater agreement
of 84% was observed. Accordingly, this comprehensive experiment fairly validates the
effectiveness of the suggested model. This study contributes to the related knowledge
area by introducing a novel framework to assess Wikipedia articles’ IQ.
Citation:
Sirisoma, W.C.S. (2022). Analysing information quality of Wikipedia articles [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa.http://dl.lib.uom.lk/handle/123/22436