Combining Automatic speech recognition models to reduce error propagation in law-resource transfer-learning speech-command recognition
dc.contributor.advisor | Thayasivam U | |
dc.contributor.author | Isham JM | |
dc.date.accept | 2022 | |
dc.date.accessioned | 2022 | |
dc.date.available | 2022 | |
dc.date.issued | 2022 | |
dc.description.abstract | There are several applications when comes to spoken language understanding such as topic modeling and intent detection. One of the primary underlying components used in spoken language understanding studies is automatic speech-recognition models. In recent years we have seen a major improvement in the automatic speech recognition system to recognize spoken utterances. But it is still a challenging task for lowresource languages as it requires hundreds of hours of audio input to train an automatic speech recognition model. To overcome this issue recent studies have used transfer learning techniques. However, the errors produced by the automatic speech recognition models significantly affect the downstream natural language understanding models used for intent or topic identification. In this work, we have proposed a multi-automatic speech recognition set up to overcome this issue. We have shown that combining outputs from multiple automatic speech recognition models can significantly increase the accuracy of low-resource speech-command transfer-learning tasks than using the output from a single automatic speech recognition model. We have come up with convolution neural network-based setups that can utilize outputs from pre-trained automatic speech recognition models such as DeepSpeech2 and Wav2Vec 2.0. The experiment result shows a 7% increase in accuracy over the current state-of-the-art low resource speech-command phoneme-based speech intent classification methodology. | en_US |
dc.identifier.accno | TH4941 | en_US |
dc.identifier.citation | Isham, J.M. (2022). Combining Automatic speech recognition models to reduce error propagation in law-resource transfer-learning speech-command recognition [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21854 | |
dc.identifier.degree | MSc In Computer Science and Engineering | en_US |
dc.identifier.department | Department of Computer Science and Engineering | en_US |
dc.identifier.faculty | Engineering | en_US |
dc.identifier.uri | http://dl.lib.uom.lk/handle/123/21854 | |
dc.language.iso | en | en_US |
dc.subject | SPEECH-COMMAND RECOGNITION | en_US |
dc.subject | FEATURE CONCATENATION | en_US |
dc.subject | LOW-RESOURCE TRANSFER LEARNING | en_US |
dc.subject | SPEECH RECOGNITION | en_US |
dc.subject | INFORMATION TECHNOLOGY -Dissertation | en_US |
dc.subject | COMPUTER SCIENCE -Dissertation | en_US |
dc.subject | COMPUTER SCIENCE & ENGINEERING -Dissertation | en_US |
dc.title | Combining Automatic speech recognition models to reduce error propagation in law-resource transfer-learning speech-command recognition | en_US |
dc.type | Thesis-Abstract | en_US |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- TH4941-1.pdf
- Size:
- 139.46 KB
- Format:
- Adobe Portable Document Format
- Description:
- Pre-Text
Loading...
- Name:
- TH4941-2.pdf
- Size:
- 102.99 KB
- Format:
- Adobe Portable Document Format
- Description:
- Post-Text
Loading...
- Name:
- TH4941.pdf
- Size:
- 1013.22 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full-theses