Institutional-Repository, University of Moratuwa.  

Speech embedding with segregation of paralinguistic information for low-resource languages

Show simple item record

dc.contributor.advisor Thayasivam U
dc.contributor.author Ignatius A
dc.date.accessioned 2021T03:26:20Z
dc.date.available 2021T03:26:20Z
dc.date.issued 2021
dc.identifier.citation Ignatius, A. (2021). Speech embedding with segregation of paralinguistic information for low-resource languages [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22663
dc.identifier.uri http://dl.lib.uom.lk/handle/123/22663
dc.description.abstract Speech embeddings produced by Deep Neural Networks have yielded promising results in a variety of speech processing applications. However, the performance in speech tasks like automatic speech recognition and speech intent identification can be affected to a great extent when there is a discrepancy between training and testing conditions. This is because, in addition to linguistic information, speech signals carry para-linguistic information including speaker characteristics, emotional states, and accent. Variations in the speaker traits and states lead to compromise on performance in speech recognition applications that require only linguistic information. Over the years, there have been various approaches that attempt to disentangle the para-linguistic information that support the linguistic information in speech. The commonly used strategy is to integrate speaker representations into speech recognition models to normalise the speaker effects. Still, it has received less attention when it comes to studies on speech-to-intent classification. Furthermore, large amounts of labeled speech data are required for these speaker normalisation techniques. Under low-resource settings, when there is only a limited number of speech samples available for training, transfer learning strategies can be used. This study presents a speaker-invariant speech intent classification model using i-vector based feature augmentation. We investigate the use of pre-trained acoustic models for transferlearning under low-resource settings. The proposed method is evaluated on the banking domain speech intent dataset in Sinhala and Tamil languages along with fluent speech command dataset. Experimental results show the effectiveness of the proposed method in achieving better prediction in the speech-to-intent classification model en_US
dc.language.iso en en_US
dc.subject LINGUISTIC
dc.subject SPEECH-TO-INTENT
dc.subject PARA-LINGUISTIC INFORMATION
dc.subject SPEAKER REPRESENTATION
dc.subject SPEECH RECOGNITION
dc.subject COMPUTER SCIENCE & ENGINEERING – Dissertation
dc.subject COMPUTER SCIENCE- Dissertation
dc.subject MSc (Major Component Research)
dc.title Speech embedding with segregation of paralinguistic information for low-resource languages en_US
dc.type Thesis-Abstract en_US
dc.identifier.faculty Engineering en_US
dc.identifier.degree Master of Science (Major Component of Research) en_US
dc.identifier.department Department of Computer Science & Engineering en_US
dc.date.accept 2021
dc.identifier.accno TH5107 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record