Improving speech to intent classification using self supervised learning for LRL in a specific domain
| dc.contributor.advisor | Thayasivam, U | |
| dc.contributor.author | Inthirakumaaran, T | |
| dc.date.accept | 2023 | |
| dc.date.accessioned | 2025-05-29T08:54:00Z | |
| dc.date.issued | 2023 | |
| dc.description.abstract | Speech-related applications have become increasingly prevalent in various fields, such as speech topic identification and spoken command recognition. With advancements in technology, Automatic Speech Recognition (ASR) has made significant progress, with some recent researchevenshowingperformancesimilarto humarns in certain applications, to the extend where the latest state-of-the-art ASR models can now easily identity the free form of speech. However, constructing an ASR for a language can be a demanding task, requiring considerable resources and time. Thus not all languages have access toASRmodels,particularlyLowResource Languages (LRL). To achieve better results forLRL,domain-specificmodelscanbe created, and previous studies have explored this appoprach while addressing the scarcity of data. Some studies have even employed English phoneme based ASR models as a basis for developing LRL ASR models. This approach entails adapting the English model to the target language by exploiting phoneme-level information, whichcanhelpalleviatetheshortageofavailabledata. In this research, we aim to analyze the possibility of applying the Wav2Vec2.0 (W2V2) framework which utilizes self-supervised learning and introduced in 2020, to the medical domain for building an ASR for Tamil language. We compared two different methods for the applying W2V2 model: a fine-tuning approach with a limited amount of labeled data (42 minutes) transcribed in Tamil, and a transfer learning technique. Our findings indicate that the fine-tuning method outperforms the transfer learningtechnique.Moreover,themodelshowedasignificantincreasein accuracy compared to the existing state-of-the-art phoneme-based speech intent classification methodology for low resource languages. We also analysedtheimpact of label data size and thenumberofintentsontheaccuracyofintentclassificationin a complex domain. This study presents a significant step forward in enhancing speechrecognitioncapabilitiesforLRLscommunity | |
| dc.identifier.accno | TH5312 | |
| dc.identifier.citation | Inthirakumaaran, T (2023). Improving speech to intent classification using self supervised learning for LRL in a specific domain [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/23570 | |
| dc.identifier.degree | MSc in Computer Science | |
| dc.identifier.department | Department of Computer Science & Engineering | |
| dc.identifier.faculty | Engineering | |
| dc.identifier.uri | https://dl.lib.uom.lk/handle/123/23570 | |
| dc.language.iso | en | |
| dc.subject | AUTOMATIC SPEECH RECOGNITION | |
| dc.subject | ASR | |
| dc.subject | LOW-RESOURCE-LANGUAGE | |
| dc.subject | INTENT CLASSIFICATION | |
| dc.subject | WAV2VEC 2 | |
| dc.subject | 0 FRAMEWORK | |
| dc.title | Improving speech to intent classification using self supervised learning for LRL in a specific domain | |
| dc.type | Thesis-Abstract |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- TH5212-1.pdf
- Size:
- 139.47 KB
- Format:
- Adobe Portable Document Format
- Description:
- Pre-text
Loading...
- Name:
- TH5212-2.pdf
- Size:
- 245.54 KB
- Format:
- Adobe Portable Document Format
- Description:
- Post-text
Loading...
- Name:
- TH5212.pdf
- Size:
- 1.29 MB
- Format:
- Adobe Portable Document Format
- Description:
- Full-thesis
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
