Unified deep convolutional network for robust and highly generalized speaker clustering

Suntharam, K; Janakan, S; Thayasivam, U

Unified deep convolutional network for robust and highly generalized speaker clustering

dc.contributor.author	Suntharam, K
dc.contributor.author	Janakan, S
dc.contributor.author	Thayasivam, U
dc.date.accessioned	2026-01-16T05:23:49Z
dc.date.issued	2025
dc.description.abstract	Speaker Clustering (SC) is the task of allocating the speaker utterances into speaker-specific groups without the prior knowledge of the number and identity of speakers. In this paper, we elaborate on the application of transfer learning in a modified Visual Geometry Group (VGGish) net trained on Audioset data for a large scale Audio Classification. We transferred the knowledge from VGGish, integrated a Micro CNN architecture, and enhanced the voice feature modeling for the SC task. With our hybrid embedding extraction method (VGGish-SC), we outperformed the clustering performance in terms of misClassification rate (MR) on TIMIT and VCTK datasets against the state of the art SC methods. Various experimentations carried out validated our proposed methodology bettered state of the art approaches in in-domain by 25% and out-domain by 75%. And we reported baseline results for SC on noisy utterances, speaker accent variations, and language variations for the first time.
dc.identifier.conference	Moratuwa Engineering Research Conference 2025
dc.identifier.department	Engineering Research Unit, University of Moratuwa
dc.identifier.email	sketharan1996.15@cse.mrt.ac.lk
dc.identifier.email	sarangan.15@cse.mrt.ac.lk
dc.identifier.email	rtuthaya@cse.mrt.ac.lk
dc.identifier.faculty	Engineering
dc.identifier.isbn	979-8-3315-6724-8
dc.identifier.pgnos	pp. 275-279
dc.identifier.proceeding	Proceedings of Moratuwa Engineering Research Conference 2025
dc.identifier.uri	https://dl.lib.uom.lk/handle/123/24731
dc.language.iso	en
dc.publisher	IEEE
dc.subject	Terms:Audio Classification
dc.subject	Transfer Learning
dc.subject	Speaker Clustering.
dc.title	Unified deep convolutional network for robust and highly generalized speaker clustering
dc.type	Conference-Full-text

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1571152790.pdf
Size:: 2.69 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

MERCon - 2025