Abstract:
Automatic Speech Recognition (ASR) has become a fast-growing research domain due to advancements in Machine Learning. In addition to the development of large training corpora, the introduction of novel architectures for ASR models has contributed to defining new boundaries for the performance of speech recognition systems. However, there is a significant difference in speech recognition accuracy between major world languages and low-resourced languages such as Sinhala, due to inadequate research. We have applied enhanced time-delay neural network architectures for acoustic modeling in Sinhala ASR, including the Multistream CNN architecture. Using the Kaldi ASR Toolkit, we have trained ASR models with a publicly available corpus of over 200 hours of speech data. The results show a remarkable improvement in the accuracy of Sinhala speech recognition as demonstrated by a reduction in the Word-Error-Rate (WER) to 25.12%.
Citation:
D. Warusawithana, N. Kulaweera, L. Weerasinghe and B. Karunarathne, "Enhanced Time Delay Neural Network Architectures for Sinhala Speech Recognition," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-6, doi: 10.1109/MERCon55799.2022.9906216.