Improving speech to intent classification using self supervised learning for LRL in a specific domain

Inthirakumaaran, T

Improving speech to intent classification using self supervised learning for LRL in a specific domain

dc.contributor.advisor	Thayasivam, U
dc.contributor.author	Inthirakumaaran, T
dc.date.accept	2023
dc.date.accessioned	2025-05-29T08:54:00Z
dc.date.issued	2023
dc.description.abstract	Speech-related applications have become increasingly prevalent in various fields, such as speech topic identification and spoken command recognition. With advancements in technology, Automatic Speech Recognition (ASR) has made significant progress, with some recent researchevenshowingperformancesimilarto humarns in certain applications, to the extend where the latest state-of-the-art ASR models can now easily identity the free form of speech. However, constructing an ASR for a language can be a demanding task, requiring considerable resources and time. Thus not all languages have access toASRmodels,particularlyLowResource Languages (LRL). To achieve better results forLRL,domain-specificmodelscanbe created, and previous studies have explored this appoprach while addressing the scarcity of data. Some studies have even employed English phoneme based ASR models as a basis for developing LRL ASR models. This approach entails adapting the English model to the target language by exploiting phoneme-level information, whichcanhelpalleviatetheshortageofavailabledata. In this research, we aim to analyze the possibility of applying the Wav2Vec2.0 (W2V2) framework which utilizes self-supervised learning and introduced in 2020, to the medical domain for building an ASR for Tamil language. We compared two different methods for the applying W2V2 model: a fine-tuning approach with a limited amount of labeled data (42 minutes) transcribed in Tamil, and a transfer learning technique. Our findings indicate that the fine-tuning method outperforms the transfer learningtechnique.Moreover,themodelshowedasignificantincreasein accuracy compared to the existing state-of-the-art phoneme-based speech intent classification methodology for low resource languages. We also analysedtheimpact of label data size and thenumberofintentsontheaccuracyofintentclassificationin a complex domain. This study presents a significant step forward in enhancing speechrecognitioncapabilitiesforLRLscommunity
dc.identifier.accno	TH5312
dc.identifier.citation	Inthirakumaaran, T (2023). Improving speech to intent classification using self supervised learning for LRL in a specific domain [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/23570
dc.identifier.degree	MSc in Computer Science
dc.identifier.department	Department of Computer Science & Engineering
dc.identifier.faculty	Engineering
dc.identifier.uri	https://dl.lib.uom.lk/handle/123/23570
dc.language.iso	en
dc.subject	AUTOMATIC SPEECH RECOGNITION
dc.subject	ASR
dc.subject	LOW-RESOURCE-LANGUAGE
dc.subject	INTENT CLASSIFICATION
dc.subject	WAV2VEC 2
dc.subject	0 FRAMEWORK
dc.title	Improving speech to intent classification using self supervised learning for LRL in a specific domain
dc.type	Thesis-Abstract

Files

Original bundle

Now showing 1 - 3 of 3

Name:: TH5212-1.pdf
Size:: 139.47 KB
Format:: Adobe Portable Document Format
Description:: Pre-text

Download

Name:: TH5212-2.pdf
Size:: 245.54 KB
Format:: Adobe Portable Document Format
Description:: Post-text

Download

Name:: TH5212.pdf
Size:: 1.29 MB
Format:: Adobe Portable Document Format
Description:: Full-thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Master of Science in Computer science and Engineering