Unified deep convolutional network for robust and highly generalized speaker clustering

Suntharam, K; Janakan, S; Thayasivam, U

Unified deep convolutional network for robust and highly generalized speaker clustering

Files

1571152790.pdf (2.69 MB)

Date

2025

Authors

Suntharam, K

Janakan, S

Thayasivam, U

Publisher

IEEE

Abstract

Speaker Clustering (SC) is the task of allocating the speaker utterances into speaker-specific groups without the prior knowledge of the number and identity of speakers. In this paper, we elaborate on the application of transfer learning in a modified Visual Geometry Group (VGGish) net trained on Audioset data for a large scale Audio Classification. We transferred the knowledge from VGGish, integrated a Micro CNN architecture, and enhanced the voice feature modeling for the SC task. With our hybrid embedding extraction method (VGGish-SC), we outperformed the clustering performance in terms of misClassification rate (MR) on TIMIT and VCTK datasets against the state of the art SC methods. Various experimentations carried out validated our proposed methodology bettered state of the art approaches in in-domain by 25% and out-domain by 75%. And we reported baseline results for SC on noisy utterances, speaker accent variations, and language variations for the first time.

Keywords

Terms:Audio Classification, Transfer Learning, Speaker Clustering.

URI

https://dl.lib.uom.lk/handle/123/24731

Collections

MERCon - 2025

Full item page

Unified deep convolutional network for robust and highly generalized speaker clustering

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By