Cross-domain bimodal SER for customer service and tv show domains

Loading...
Thumbnail Image

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Department of Computer Science and Engineering

Abstract

Human speech is the most common and expedient way of communication, and understanding speech is one of the complex mechanisms that the human brain performs. As technology advances, replicating this ability in machines has become essential, leading to the rise of Speech Emotion Recognition (SER) as a key field in artificial intelligence and human-computer interaction. However, the challenge of accurately recognizing emotions from speech is compounded by the variability in emotional expression across different contexts [1]. In customer service interactions, emotions like happiness or frustration are often conveyed subtly, whereas in TV shows, they are exaggerated for dramatic effect. This contrast poses a challenge for SER models, as emotional expressions differ significantly across domains.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By