Cross-domain bimodal SER for customer service and tv show domains

Premnath, N; Lakraj, P; Jayaweera, Y

Cross-domain bimodal SER for customer service and tv show domains

Files

Paper 35 - ADScAI 2025.pdf (198.48 KB)

Date

2025

Authors

Premnath, N

Lakraj, P

Jayaweera, Y

Publisher

Department of Computer Science and Engineering

Abstract

Human speech is the most common and expedient way of communication, and understanding speech is one of the complex mechanisms that the human brain performs. As technology advances, replicating this ability in machines has become essential, leading to the rise of Speech Emotion Recognition (SER) as a key field in artificial intelligence and human-computer interaction. However, the challenge of accurately recognizing emotions from speech is compounded by the variability in emotional expression across different contexts [1]. In customer service interactions, emotions like happiness or frustration are often conveyed subtly, whereas in TV shows, they are exaggerated for dramatic effect. This contrast poses a challenge for SER models, as emotional expressions differ significantly across domains.