Low resource multi-asr speech command recognition

Mohamed, I; Thayasivam, U

UoM IR
→
Research Publications
→
Conference Proceedings
→
UoM Conferences
→
Faculty of Engineering Research Unit (ERU & MERCon)
→
MERCon - 2022
→
View Item

dc.contributor.author	Mohamed, I
dc.contributor.author	Thayasivam, U
dc.contributor.editor	Rathnayake, M
dc.contributor.editor	Adhikariwatte, V
dc.contributor.editor	Hemachandra, K
dc.date.accessioned	2022-10-27T08:39:40Z
dc.date.available	2022-10-27T08:39:40Z
dc.date.issued	2022-07
dc.identifier.citation	I. Mohamed and U. Thayasivam, "Low Resource Multi-ASR Speech Command Recognition," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-6, doi: 10.1109/MERCon55799.2022.9906230.	en_US
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/19270
dc.description.abstract	There are several applications when comes to spoken language understanding (SLU) such as topic identification and intent detection. One of the primary underlying components used in SLU studies are ASR (Automatic Speech Recognition). In recent years we have seen a major improvement in the ASR system to recognize spoken utterances. But it is still a challenging task for low resource languages as it requires 100’s hours of audio input to train an ASR model. To overcome this issue recent studies have used transfer learning techniques. However, the errors produced by the ASR models significantly affect the downstream natural language understanding (NLU) models used for intent or topic identification. In this work, we have proposed a multi-ASR setup to overcome this issue. We have shown that combining outputs from multiple ASR models can significantly increase the accuracy of low-resource speech-command transfer-learning tasks than using the output from a single ASR model. We have come up with CNN based setups that can utilize outputs from pre-trained ASR models such as DeepSpeech2 and Wav2Vec 2.0. The experiment result shows an 8% increase in accuracy over the current state-of-the-art low resource speech-command phoneme-based speech intent classification methodology.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.relation.uri	https://ieeexplore.ieee.org/document/9906230	en_US
dc.subject	Speech Intent Classification	en_US
dc.subject	Low-Resource	en_US
dc.subject	DeepSpeech2	en_US
dc.subject	Wav2Vec2.0	en_US
dc.subject	Tamil	en_US
dc.title	Low resource multi-asr speech command recognition	en_US
dc.type	Conference-Full-text	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.department	Engineering Research Unit, University of Moratuwa	en_US
dc.identifier.year	2022	en_US
dc.identifier.conference	Moratuwa Engineering Research Conference 2022	en_US
dc.identifier.place	Moratuwa, Sri Lanka	en_US
dc.identifier.proceeding	Proceedings of Moratuwa Engineering Research Conference 2022	en_US
dc.identifier.email	jazeem.20@cse.mrt.ac.lk
dc.identifier.email	rtuthaya@cse.mrt.ac.lk
dc.identifier.doi	10.1109/MERCon55799.2022.9906230	en_US