Structuring the knowledge for systematic information retrieval - knowledge graph and machine learning approach

dc.contributor.advisorAmbegoda, T
dc.contributor.authorAhamed, MFS
dc.date.accept2025
dc.date.accessioned2026-02-09T09:06:56Z
dc.date.issued2025
dc.description.abstractThe COVID-19 pandemic has led to the publication of a massive amount of research papers, making it hard for researchers to find relevant information quickly. This study aims to solve this problem by using knowledge graphs to organize and analyze data from the Kaggle CORD-19 dataset and AWS metadata. Over 401,270 PDF and 315,742 PMC JSON files were processed, supported by millions of metadata connections. Knowledge graphs were created to show relationships between topics, countries, institutions, authors, concepts, and sentiment scores, allowing researchers to explore the data in multiple ways. A BERT-based sentiment analysis model was used to assign sentiment scores to papers, adding 32,299 new connections to the graph. These scores grouped papers based on similar tones and emotions, helped to uncover hidden patterns and trends. By integrating these insights into a combined knowledge graph, researchers can now traverse connections across metadata properties such as authors, institutions, topics, or sentiment scores, broadening the scope of discovery within the CORD-19 dataset. Visualizations showed how papers are connected to different metadata properties, such as the countries where research originated, the institutions involved, and overlapping research themes. Concept graphs included confidence scores to show how strongly a paper is linked to a concept. Sentiment graphs added new layers of connections that go beyond traditional metadata. Statistics highlight the size and complexity of these graphs, with 453,633 country edges, 476,865 institutional edges, and 1,783,589 concept edges. Also, average connectivity per node increases after adding sentiment score to the knowledge graph. This study shows that knowledge graphs are a powerful way to organize and explore large collections of research papers. Adding sentiment analysis improves the depth of analysis, making it easier to find valuable information and uncover new insights. This method can be applied to other fields in the future, providing a strong tool for solving global challenges by organizing and analyzing large datasets.
dc.identifier.accnoTH5996
dc.identifier.citationAhamed, M.F.S. (2025). Structuring the knowledge for systematic information retrieval - knowledge graph and machine learning approach [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24823
dc.identifier.degreeMSc in Computer Science
dc.identifier.departmentDepartment of Computer Science & Engineering
dc.identifier.facultyEngineering
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/24823
dc.language.isoen
dc.subjectKNOWLEDGE GRAPHS
dc.subjectSEMANTIC NETWORKS
dc.subjectMACHINE LEARNING
dc.subjectINFORMATION RETRIEVAL
dc.subjectCOVID-19 OPEN RESEARCH DATASET
dc.subjectSENTIMENT ANALYSIS
dc.subjectCOMPUTER SCIENCE-Dissertation
dc.subjectCOMPUTER SCIENCE AND ENGINEERING-Dissertation
dc.subjectMSc in Computer Science
dc.titleStructuring the knowledge for systematic information retrieval - knowledge graph and machine learning approach
dc.typeThesis-Full-text

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5996-1.pdf
Size:
993.22 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH5996-2.pdf
Size:
311.29 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH5996.pdf
Size:
4.08 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: