Improving speech to intent classification using self supervised learning for LRL in a specific domain

dc.contributor.advisorThayasivam, U
dc.contributor.authorInthirakumaaran, T
dc.date.accept2023
dc.date.accessioned2025-05-29T08:54:00Z
dc.date.issued2023
dc.description.abstractSpeech-related applications have become increasingly prevalent in various fields, such as speech topic identification and spoken command recognition. With advancements in technology, Automatic Speech Recognition (ASR) has made significant progress, with some recent researchevenshowingperformancesimilarto humarns in certain applications, to the extend where the latest state-of-the-art ASR models can now easily identity the free form of speech. However, constructing an ASR for a language can be a demanding task, requiring considerable resources and time. Thus not all languages have access toASRmodels,particularlyLowResource Languages (LRL). To achieve better results forLRL,domain-specificmodelscanbe created, and previous studies have explored this appoprach while addressing the scarcity of data. Some studies have even employed English phoneme based ASR models as a basis for developing LRL ASR models. This approach entails adapting the English model to the target language by exploiting phoneme-level information, whichcanhelpalleviatetheshortageofavailabledata. In this research, we aim to analyze the possibility of applying the Wav2Vec2.0 (W2V2) framework which utilizes self-supervised learning and introduced in 2020, to the medical domain for building an ASR for Tamil language. We compared two different methods for the applying W2V2 model: a fine-tuning approach with a limited amount of labeled data (42 minutes) transcribed in Tamil, and a transfer learning technique. Our findings indicate that the fine-tuning method outperforms the transfer learningtechnique.Moreover,themodelshowedasignificantincreasein accuracy compared to the existing state-of-the-art phoneme-based speech intent classification methodology for low resource languages. We also analysedtheimpact of label data size and thenumberofintentsontheaccuracyofintentclassificationin a complex domain. This study presents a significant step forward in enhancing speechrecognitioncapabilitiesforLRLscommunity
dc.identifier.accnoTH5312
dc.identifier.citationInthirakumaaran, T (2023). Improving speech to intent classification using self supervised learning for LRL in a specific domain [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/23570
dc.identifier.degreeMSc in Computer Science
dc.identifier.departmentDepartment of Computer Science & Engineering
dc.identifier.facultyEngineering
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/23570
dc.language.isoen
dc.subjectAUTOMATIC SPEECH RECOGNITION
dc.subjectASR
dc.subjectLOW-RESOURCE-LANGUAGE
dc.subjectINTENT CLASSIFICATION
dc.subjectWAV2VEC 2
dc.subject0 FRAMEWORK
dc.titleImproving speech to intent classification using self supervised learning for LRL in a specific domain
dc.typeThesis-Abstract

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5212-1.pdf
Size:
139.47 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH5212-2.pdf
Size:
245.54 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH5212.pdf
Size:
1.29 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: