End To End Model For Speaker Identification With Minimal Training Data

Balakrishnan, S; Jathusan, K; Thayasivam, U

End To End Model For Speaker Identification With Minimal Training Data

Date

2021-07

Authors

Balakrishnan, S

Jathusan, K

Thayasivam, U

Publisher

IEEE

Abstract

Deep learning has achieved immense universality by outperforming GMM and i-vectors on speaker identification. Neural Network approaches have obtained promising results when fed by raw speech samples directly. Modified Convolutional Neural Network (CNN) architecture called SincNet, based on parameterized sinc functions which offer a very compact way to derive a customized filter bank in the short utterance. This paper proposes attention based Long Short Term Memory (LSTM) architecture that encourages discovering more meaningful speaker-related features with minimal training data. Attention layer built using Neural Networks offers a unique and efficient representation of the speaker characteristics which explore the connection between an aspect and the content of short utterances. The proposed approach converges faster and performs better than the SincNet on the experiments carried out in the speaker identification tasks.

Keywords

Speaker recognition, Neural networks, Attention layer

Citation

S. Balakrishnan, K. Jathusan and U. Thayasivam, "End To End Model For Speaker Identification With Minimal Training Data," 2021 Moratuwa Engineering Research Conference (MERCon), 2021, pp. 456-461, doi: 10.1109/MERCon52712.2021.9525740.

URI

http://dl.lib.uom.lk/handle/123/19152

DOI

10.1109/MERCon52712.2021.9525740

Collections

MERCon - 2021

Full item page

End To End Model For Speaker Identification With Minimal Training Data

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By