Speech to capability mapping framework for Sinhala language (Abstract)

Loading...
Thumbnail Image

Date

2018

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Matching voice to a predefined set of capabilities is a key requirement in many applications domains such as, robotics, automation, personal devices, building management, automobile and elderly & differently abled assistant systems. The approach is to convert speech to text using Automatic Speech Recognition (ASR) and match the text to the list of predefined capabilities. Initial attempts for speech recognition made use of dynamic time warping (DTW) algorithms. Later research made use of Hidden Markov Model (HMM) based approaches that were more accurate than DTW based approaches. Recently ‘deep learning’ has shown very high quality human like performance in converting speech to text. There have been few prior research efforts towards a Sinhala ASR [1,2]. However, they have not been able to gain wider applicability and higher accuracy. This research built a Sinhala, as well as a Tamil speech dataset and used it to implement a speech to intent classification model. First version was built using Deep Learning techniques from the scratch. The second used transfer learning techniques. In transfer learning, we utilized a model trained for the English language with a large amount of English speech, and further finetuned it with the limited amount of Sinhala/Tamil speech data. This was the first research to experiment with transfer learning for speech intent classification

Description

Citation

DOI

Endorsement

Review

Supplemented By

Referenced By