Speech embedding with segregation of paralinguistic information for low-resource languages

dc.contributor.advisorThayasivam U
dc.contributor.authorIgnatius A
dc.date.accept2021
dc.date.accessioned2021T03:26:20Z
dc.date.available2021T03:26:20Z
dc.date.issued2021
dc.description.abstractSpeech embeddings produced by Deep Neural Networks have yielded promising results in a variety of speech processing applications. However, the performance in speech tasks like automatic speech recognition and speech intent identification can be affected to a great extent when there is a discrepancy between training and testing conditions. This is because, in addition to linguistic information, speech signals carry para-linguistic information including speaker characteristics, emotional states, and accent. Variations in the speaker traits and states lead to compromise on performance in speech recognition applications that require only linguistic information. Over the years, there have been various approaches that attempt to disentangle the para-linguistic information that support the linguistic information in speech. The commonly used strategy is to integrate speaker representations into speech recognition models to normalise the speaker effects. Still, it has received less attention when it comes to studies on speech-to-intent classification. Furthermore, large amounts of labeled speech data are required for these speaker normalisation techniques. Under low-resource settings, when there is only a limited number of speech samples available for training, transfer learning strategies can be used. This study presents a speaker-invariant speech intent classification model using i-vector based feature augmentation. We investigate the use of pre-trained acoustic models for transferlearning under low-resource settings. The proposed method is evaluated on the banking domain speech intent dataset in Sinhala and Tamil languages along with fluent speech command dataset. Experimental results show the effectiveness of the proposed method in achieving better prediction in the speech-to-intent classification modelen_US
dc.identifier.accnoTH5107en_US
dc.identifier.citationIgnatius, A. (2021). Speech embedding with segregation of paralinguistic information for low-resource languages [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22663
dc.identifier.degreeMaster of Science (Major Component of Research)en_US
dc.identifier.departmentDepartment of Computer Science & Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/22663
dc.language.isoenen_US
dc.subjectLINGUISTIC
dc.subjectSPEECH-TO-INTENT
dc.subjectPARA-LINGUISTIC INFORMATION
dc.subjectSPEAKER REPRESENTATION
dc.subjectSPEECH RECOGNITION
dc.subjectCOMPUTER SCIENCE & ENGINEERING – Dissertation
dc.subjectCOMPUTER SCIENCE- Dissertation
dc.subjectMSc (Major Component Research)
dc.titleSpeech embedding with segregation of paralinguistic information for low-resource languagesen_US
dc.typeThesis-Abstracten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5107-1.pdf
Size:
90.75 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH5107-2.pdf
Size:
113.67 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH5107.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: