Integrating music information retrieval and transfer learning for advanced emotion recognition in Sri Lankan crowd soundscapes : dataset creation and analysis
Loading...
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Understanding crowd emotions through sound is critical for applications in event monitoring, public safety, and mental health studies. However, there has been a notable gap in the availability of specialized datasets and novel robust models for classifying crowd sound emotions. To address this, a comprehensive Sri Lankan Crowd Sound Emotion Dataset (SLCSED) was developed, enriched with detailed annotations, to support future research. The study proposes a computational framework based on Music Information Retrieval (MIR) techniques combined with advanced machine learning algorithms to perform emotion classification in crowd. Feature extraction was performed using MIR methods, Wav2Vec 2.0 embeddings, and Emotion2Vec representation. PCA was applied as a dimensionality reduction technique. Various machine learning and transfer learning classifiers, including TabNet, LightGBM, Multi-Layer Perceptrons (MLP), wav2vec, and emotion2vec, were evaluated. Specific architectures were tuned for better accuracy, such as LightGBM with Gradient boosting and MLPs with hidden layers of (128, 64) units. Furthermore, emotion recognition models were developed using supervised learning methods, drawing inspiration from approaches tested on decision trees, random forests, XGBoost, and LightGBM in related studies. The results demonstrated highly promising outcomes, with the LightGBM classifier achieving up to 99.95% validation accuracy on the Emotional Crowd Sounds Data(ECSD) dataset and the MLP achieving 99.53% on the SLCSED dataset without dimensionality reduction. PCA was found to slightly reduce the performance in most cases. Additionally, the Emotion2Vec framework showed significant improvements after PCA application, reaching 99.99% accuracy. These findings highlight the effectiveness of MIR-based feature engineering combined with carefully selected classifiers for crowd emotion detection. This work not only fills a major gap by introducing a localized and richly annotated dataset but also presents a robust methodological pipeline for crowd sound emotion recognition, paving the way for future applications in real-world monitoring and psychological analysis.
Description
Citation
Ariyathilake, P.B.S.N. (2025). Integrating music information retrieval and transfer learning for advanced emotion recognition in Sri Lankan crowd soundscapes : dataset creation and analysis [Master\'s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24845
