Enhanced feature aggregation for deep neural network based speaker embedding
dc.contributor.author | Thevagumaran, R | |
dc.contributor.author | Sivaneswaran, T | |
dc.contributor.author | Karunarathne, B | |
dc.contributor.editor | Rathnayake, M | |
dc.contributor.editor | Adhikariwatte, V | |
dc.contributor.editor | Hemachandra, K | |
dc.date.accessioned | 2022-10-27T08:37:22Z | |
dc.date.available | 2022-10-27T08:37:22Z | |
dc.date.issued | 2022-07 | |
dc.description.abstract | This paper proposes a new feature aggregation mechanism for deep neural network based speaker embedding for text-independent speaker verification. In speaker verification models, frame-level features are fed into the pooling layer or the feature aggregation component to obtain fixed-length utterance-level features. Our method utilizes the correlation between frame-level features such that dependencies between speaker discriminative information are represented with weights and produces weighted mean features with fixed-length as output. Our pooling mechanism is applied to the ECAPA-TDNN baseline architecture. In comparison to the Attentive Statistics Pooling applied to the same baseline, training on VoxCeleb1-dev dataset and an evaluation on the VoxCeleb1-test dataset shows that it reduces equal error rate (EER) by 7.32% and minimum normalized detection cost function (MinDCF10 -2 ) by 7.34%. | en_US |
dc.identifier.citation | R. Thevagumaran, T. Sivaneswaran and B. Karunarathne, "Enhanced Feature Aggregation for Deep Neural Network Based Speaker Embedding," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-5, doi: 10.1109/MERCon55799.2022.9906175. | en_US |
dc.identifier.conference | Moratuwa Engineering Research Conference 2022 | en_US |
dc.identifier.department | Engineering Research Unit, University of Moratuwa | en_US |
dc.identifier.doi | 10.1109/MERCon55799.2022.9906175 | en_US |
dc.identifier.email | 170479N@uom.lk | |
dc.identifier.email | 170643m@uom.lk | |
dc.identifier.email | buddhika@cse.mrt.ac.lk | |
dc.identifier.faculty | Engineering | en_US |
dc.identifier.pgnos | ****** | en_US |
dc.identifier.place | Moratuwa, Sri Lanka | en_US |
dc.identifier.proceeding | Proceedings of Moratuwa Engineering Research Conference 2022 | en_US |
dc.identifier.uri | http://dl.lib.uom.lk/handle/123/19269 | |
dc.identifier.year | 2022 | en_US |
dc.language.iso | en | en_US |
dc.publisher | IEEE | en_US |
dc.relation.uri | https://ieeexplore.ieee.org/document/9906175 | en_US |
dc.subject | Text-independent speaker verification | en_US |
dc.subject | Speaker recognition | en_US |
dc.subject | Ecapa-tdnn | en_US |
dc.subject | Feature aggregation | en_US |
dc.title | Enhanced feature aggregation for deep neural network based speaker embedding | en_US |
dc.type | Conference-Full-text | en_US |