Question answering system for Sinhala language
Loading...
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Natural Language Processing (NLP) has seen rapid progress in recent years, thanks to the development of advanced models such as BERT (Bidirectional Encoder Representations from Transformers) and their variants. However, the research and application of NLP remain focused on high-resource languages, leaving low-resource languages with limited resources and tools. In this thesis, we present the process of developing a question-answering (QA) system for the Sinhala language, a low-resource language, using the state-of-the-art transformer models. To prepare the Sinhala dataset, we use one of the most popular benchmark QA datasets in English called SQuAD 2.0 dataset and undertake the translation of the SQuAD 2.0 dataset into Sinhala and proceed to pre-process and adapt it to create a SQuAD-like format. I describe the use of large pre-trained language model fine-tuning for the QA task. My experimental results demonstrate the suitability of XLM-RoBERTa-Large for the Sinhala language and highlight the potential of Transformer models to address the challenges faced by low-resource languages in the field of NLP. By evaluating the model using standard QA metrics, such as F1 score and exact match score, it provides insights into the model's performance and compare it with related works. With the experiments, I found that the use of already down-streamed language model for QA task and fine tuning it with custom Sinhala data has outperformed the other models with a significant increasement of the accuracy based on F1 & exact match scores. Finally, I discuss the limitations and suggest potential improvements and future research directions in building cutting-edge NLP models for the low- resource languages, emphasizing the significance of our commitment to empowering underrepresented languages in the NLP domain.
Description
Citation
Mahinda, H.L.A.V. (2023). Question answering system for Sinhala language [Master’s theses, University of Moratuwa]. , University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/23984
