Show simple item record

dc.contributor.advisor Thayasivam, U
dc.contributor.author Kugathasan, K
dc.date.accessioned 2024-12-03T04:38:49Z
dc.date.available 2024-12-03T04:38:49Z
dc.date.issued 2023
dc.identifier.citation Kugathasan, K. (2023). Developing a retrieval-based Tamil language chatbot for closed domain [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22970
dc.identifier.uri http://dl.lib.uom.lk/handle/123/22970
dc.description.abstract Chatbots are conversational systems that interact with humans via natural language. Frequently, it is used to respond to user queries and provide them with the information they need. To build a highly functional chatbot, a good corpus and a variety of language-related resources are required. Since Tamil is a low-resource language those resources are not available for Tamil. Additionally, since Tamil is also a morphologically rich language, high inflexion and free word order pose key challenges to Tamil chatbots. Due to all the above reasons, it is evident that developing an effective End-to-End chat system is challenging even for a closed domain. This study introduces a novel method for building a chatbot in Tamil by leveraging a dataset extracted from Tamil banking website’s FAQ sections and extending it to encompass the language's morphological complexity and rich inflectional structure. Intent is assigned to each query, and a multiclass intent classifier is developed to classify user intent. The CNN-based classifier demonstrated the highest performance, achieving an accuracy of 98.72%. While previous works on short-text classification in Tamil focused only on a few classes and used a very large dataset, our method produced a superior accuracy of over 98% using a small number of per-class examples even when there are 56 classes and additional challenges like class imbalance problem in the data. This shows our approach is better than any other approach for short text classification in Tamil. The major contribution of this research is the generation of the first-ever chat dataset for Tamil. Our research is the first of its kind in Tamil to show how an efficient context-less chatbot can be built using short text classification. Although this project is done for the Tamil language and for the Banking domain, this approach can be applied to other low-resourced languages and domains as well. en_US
dc.language.iso en en_US
dc.subject CHATBOTS
dc.subject NATURAL LANGUAGE PROCESSING
dc.subject CONVERSATIONAL SYSTEMS
dc.subject LOW-RESOURCE LANGUAGE
dc.subject COMPUTER SCIENCE - Dissertation
dc.subject COMPUTER SCIENCE & ENGINEERING - Dissertation
dc.subject MSc (Major Component Research)
dc.title Developing a retrieval-based Tamil language chatbot for closed domain en_US
dc.type Thesis-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.degree Master of Science (Major Component of Research) en_US
dc.identifier.department Department of Computer Science & Engineering en_US
dc.date.accept 2023
dc.identifier.accno TH5438 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record