Question answering system for Sinhala language

dc.contributor.advisorSumathipala, KASN
dc.contributor.authorMahinda, HLAV
dc.date.accept2023
dc.date.accessioned2025-08-19T08:43:16Z
dc.date.issued2023
dc.description.abstractNatural Language Processing (NLP) has seen rapid progress in recent years, thanks to the development of advanced models such as BERT (Bidirectional Encoder Representations from Transformers) and their variants. However, the research and application of NLP remain focused on high-resource languages, leaving low-resource languages with limited resources and tools. In this thesis, we present the process of developing a question-answering (QA) system for the Sinhala language, a low-resource language, using the state-of-the-art transformer models. To prepare the Sinhala dataset, we use one of the most popular benchmark QA datasets in English called SQuAD 2.0 dataset and undertake the translation of the SQuAD 2.0 dataset into Sinhala and proceed to pre-process and adapt it to create a SQuAD-like format. I describe the use of large pre-trained language model fine-tuning for the QA task. My experimental results demonstrate the suitability of XLM-RoBERTa-Large for the Sinhala language and highlight the potential of Transformer models to address the challenges faced by low-resource languages in the field of NLP. By evaluating the model using standard QA metrics, such as F1 score and exact match score, it provides insights into the model's performance and compare it with related works. With the experiments, I found that the use of already down-streamed language model for QA task and fine tuning it with custom Sinhala data has outperformed the other models with a significant increasement of the accuracy based on F1 & exact match scores. Finally, I discuss the limitations and suggest potential improvements and future research directions in building cutting-edge NLP models for the low- resource languages, emphasizing the significance of our commitment to empowering underrepresented languages in the NLP domain.
dc.identifier.accnoTH5395
dc.identifier.citationMahinda, H.L.A.V. (2023). Question answering system for Sinhala language [Master’s theses, University of Moratuwa]. , University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/23984
dc.identifier.degreeMSc in Artificial Intelligence
dc.identifier.departmentDepartment of Computational Mathematics
dc.identifier.facultyIT
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/23984
dc.language.isoen
dc.subjectNATURAL LANGUAGE PROCESSING
dc.subjectQUESTION-ANSWERING SYSTEMS
dc.subjectLOW-RESOURCE LANGUAGES
dc.subjectSINHALA LANGUAGE
dc.subjectTRANSLATION
dc.subjectLARGE LANGUAGE MODELS
dc.subjectSQuAD 2.0
dc.subjectREADING COMPREHENSION
dc.subjectCOMPUTATIONAL MATHEMATICS-Dissertation
dc.subjectMSc in Artificial Intelligence
dc.titleQuestion answering system for Sinhala language
dc.typeThesis-Abstract

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5395-1.pdf
Size:
151.03 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH5395-2.pdf
Size:
80.8 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH5395.pdf
Size:
3.23 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: