Speech embedding with segregation of paralinguistic information for low-resource languages

Ignatius A

UoM IR
→
Thesis & Dissertation
→
Faculty of Engineering, Computer Science & Engineering
→
Master of Science By Research
→
View Item

dc.contributor.advisor	Thayasivam U
dc.contributor.author	Ignatius A
dc.date.accessioned	2021T03:26:20Z
dc.date.available	2021T03:26:20Z
dc.date.issued	2021
dc.identifier.citation	Ignatius, A. (2021). Speech embedding with segregation of paralinguistic information for low-resource languages [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22663
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/22663
dc.description.abstract	Speech embeddings produced by Deep Neural Networks have yielded promising results in a variety of speech processing applications. However, the performance in speech tasks like automatic speech recognition and speech intent identification can be affected to a great extent when there is a discrepancy between training and testing conditions. This is because, in addition to linguistic information, speech signals carry para-linguistic information including speaker characteristics, emotional states, and accent. Variations in the speaker traits and states lead to compromise on performance in speech recognition applications that require only linguistic information. Over the years, there have been various approaches that attempt to disentangle the para-linguistic information that support the linguistic information in speech. The commonly used strategy is to integrate speaker representations into speech recognition models to normalise the speaker effects. Still, it has received less attention when it comes to studies on speech-to-intent classification. Furthermore, large amounts of labeled speech data are required for these speaker normalisation techniques. Under low-resource settings, when there is only a limited number of speech samples available for training, transfer learning strategies can be used. This study presents a speaker-invariant speech intent classification model using i-vector based feature augmentation. We investigate the use of pre-trained acoustic models for transferlearning under low-resource settings. The proposed method is evaluated on the banking domain speech intent dataset in Sinhala and Tamil languages along with fluent speech command dataset. Experimental results show the effectiveness of the proposed method in achieving better prediction in the speech-to-intent classification model	en_US
dc.language.iso	en	en_US
dc.subject	LINGUISTIC
dc.subject	SPEECH-TO-INTENT
dc.subject	PARA-LINGUISTIC INFORMATION
dc.subject	SPEAKER REPRESENTATION
dc.subject	SPEECH RECOGNITION
dc.subject	COMPUTER SCIENCE & ENGINEERING – Dissertation
dc.subject	COMPUTER SCIENCE- Dissertation
dc.subject	MSc (Major Component Research)
dc.title	Speech embedding with segregation of paralinguistic information for low-resource languages	en_US
dc.type	Thesis-Abstract	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.degree	Master of Science (Major Component of Research)	en_US
dc.identifier.department	Department of Computer Science & Engineering	en_US
dc.date.accept	2021
dc.identifier.accno	TH5107	en_US