dc.contributor.advisor |
Thayasivam, U |
|
dc.contributor.author |
Kugathasan, K |
|
dc.date.accessioned |
2024-12-03T04:38:49Z |
|
dc.date.available |
2024-12-03T04:38:49Z |
|
dc.date.issued |
2023 |
|
dc.identifier.citation |
Kugathasan, K. (2023). Developing a retrieval-based Tamil language chatbot for closed domain [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22970 |
|
dc.identifier.uri |
http://dl.lib.uom.lk/handle/123/22970 |
|
dc.description.abstract |
Chatbots are conversational systems that interact with humans via natural language. Frequently, it is used to respond to user queries and provide them with the information they need. To build a highly functional chatbot, a good corpus and a variety of language-related resources are required. Since Tamil is a low-resource language those resources are not available for Tamil. Additionally, since Tamil is also a morphologically rich language, high inflexion and free word order pose key challenges to Tamil chatbots. Due to all the above reasons, it is evident that developing an effective End-to-End chat system is challenging even for a closed domain. This study introduces a novel method for building a chatbot in Tamil by leveraging a dataset extracted from Tamil banking website’s FAQ sections and extending it to encompass the language's morphological complexity and rich inflectional structure. Intent is assigned to each query, and a multiclass intent classifier is developed to classify user intent. The CNN-based classifier demonstrated the highest performance, achieving an accuracy of 98.72%. While previous works on short-text classification in Tamil focused only on a few classes and used a very large dataset, our method produced a superior accuracy of over 98% using a small number of per-class examples even when there are 56 classes and additional challenges like class imbalance problem in the data. This shows our approach is better than any other approach for short text classification in Tamil. The major contribution of this research is the generation of the first-ever chat dataset for Tamil. Our research is the first of its kind in Tamil to show how an efficient context-less chatbot can be built using short text classification. Although this project is done for the Tamil language and for the Banking domain, this approach can be applied to other low-resourced languages and domains as well. |
en_US |
dc.language.iso |
en |
en_US |
dc.subject |
CHATBOTS |
|
dc.subject |
NATURAL LANGUAGE PROCESSING |
|
dc.subject |
CONVERSATIONAL SYSTEMS |
|
dc.subject |
LOW-RESOURCE LANGUAGE |
|
dc.subject |
COMPUTER SCIENCE - Dissertation |
|
dc.subject |
COMPUTER SCIENCE & ENGINEERING - Dissertation |
|
dc.subject |
MSc (Major Component Research) |
|
dc.title |
Developing a retrieval-based Tamil language chatbot for closed domain |
en_US |
dc.type |
Thesis-Full-text |
en_US |
dc.identifier.faculty |
Engineering |
en_US |
dc.identifier.degree |
Master of Science (Major Component of Research) |
en_US |
dc.identifier.department |
Department of Computer Science & Engineering |
en_US |
dc.date.accept |
2023 |
|
dc.identifier.accno |
TH5438 |
en_US |