Improving speech to intent classification using self supervised learning for LRL in a specific domain

Inthirakumaaran, T

Improving speech to intent classification using self supervised learning for LRL in a specific domain

Files

TH5212-1.pdf (139.47 KB)

TH5212-2.pdf (245.54 KB)

TH5212.pdf (1.29 MB)

Date

2023

Authors

Inthirakumaaran, T

Abstract

Speech-related applications have become increasingly prevalent in various fields, such as speech topic identification and spoken command recognition. With advancements in technology, Automatic Speech Recognition (ASR) has made significant progress, with some recent researchevenshowingperformancesimilarto humarns in certain applications, to the extend where the latest state-of-the-art ASR models can now easily identity the free form of speech. However, constructing an ASR for a language can be a demanding task, requiring considerable resources and time. Thus not all languages have access toASRmodels,particularlyLowResource Languages (LRL). To achieve better results forLRL,domain-specificmodelscanbe created, and previous studies have explored this appoprach while addressing the scarcity of data. Some studies have even employed English phoneme based ASR models as a basis for developing LRL ASR models. This approach entails adapting the English model to the target language by exploiting phoneme-level information, whichcanhelpalleviatetheshortageofavailabledata. In this research, we aim to analyze the possibility of applying the Wav2Vec2.0 (W2V2) framework which utilizes self-supervised learning and introduced in 2020, to the medical domain for building an ASR for Tamil language. We compared two different methods for the applying W2V2 model: a fine-tuning approach with a limited amount of labeled data (42 minutes) transcribed in Tamil, and a transfer learning technique. Our findings indicate that the fine-tuning method outperforms the transfer learningtechnique.Moreover,themodelshowedasignificantincreasein accuracy compared to the existing state-of-the-art phoneme-based speech intent classification methodology for low resource languages. We also analysedtheimpact of label data size and thenumberofintentsontheaccuracyofintentclassificationin a complex domain. This study presents a significant step forward in enhancing speechrecognitioncapabilitiesforLRLscommunity

Keywords

AUTOMATIC SPEECH RECOGNITION, ASR, LOW-RESOURCE-LANGUAGE, INTENT CLASSIFICATION, WAV2VEC 2, 0 FRAMEWORK

Citation

Inthirakumaaran, T (2023). Improving speech to intent classification using self supervised learning for LRL in a specific domain [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/23570

URI

https://dl.lib.uom.lk/handle/123/23570

Collections

Master of Science in Computer science and Engineering

Full item page

Improving speech to intent classification using self supervised learning for LRL in a specific domain

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By