MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models
- URL: http://arxiv.org/abs/2511.00850v1
- Date: Sun, 02 Nov 2025 08:22:30 GMT
- Title: MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models
- Authors: Yayue Deng, Guoqiang Hu, Haiyang Sun, Xiangyu Zhang, Haoyang Zhang, Fei Tian, Xuerui Yang, Gang Yu, Eng Siong Chng,
- Abstract summary: Spoken Dialogue Models (SDMs) have advanced rapidly, yet their ability to sustain genuinely interactive multi-turn conversations remains underexplored.<n>We introduce Multi-Bench, the first benchmark explicitly designed to evaluate SDMs in multi-turn interactive dialogue.
- Score: 47.12029218296983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken Dialogue Models (SDMs) have advanced rapidly, yet their ability to sustain genuinely interactive multi-turn conversations remains underexplored, as most benchmarks focus on single-turn exchanges. We introduce Multi-Bench, the first benchmark explicitly designed to evaluate SDMs in multi-turn interactive dialogue with an emphasis on emotional intelligence. Multi-Bench employs a hierarchical structure with a basic track for emotion understanding and reasoning and an advanced track for emotion support and application. It comprises five carefully designed tasks and about 3.2K samples, ranging from emotion recognition to complex reasoning and interactive dialogue, supported by a reproducible evaluation framework. We evaluate six representative SDMs on eight subsets of Multi-Bench. Results show that while current SDMs achieve good performance on basic understanding tasks, they still have room for improvement in advanced multi-turn interactive dialogue and reasoning-related tasks, particularly in emotion awareness and application.
Related papers
- Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities [93.09944267871163]
FullDuplexBench is a benchmark that systematically evaluates key interactive behaviors.<n>By releasing our benchmark code we aim to advance spoken dialogue modeling and the development of more natural and engaging SDMs.
arXiv Detail & Related papers (2025-03-06T18:59:16Z) - An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue [21.938414385824903]
This paper focuses on the task of addressee recognition, identifying who is being addressed to take the next turn.<n>A subset of the corpus was annotated with addressee information, revealing that explicit addressees are indicated in approximately 20% of conversational turns.
arXiv Detail & Related papers (2025-01-28T02:27:55Z) - A Survey on Multi-Turn Interaction Capabilities of Large Language Models [47.05742294162551]
Multi-turn interaction in the dialogue system research refers to a system's ability to maintain context across multiple dialogue turns.<n>Recent advancements in large language models (LLMs) have significantly expanded the scope of multi-turn interaction.
arXiv Detail & Related papers (2025-01-17T05:21:49Z) - DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling [73.08187964426823]
Large language models (LLMs) enabled dialogue systems have become one of the central modes in human-machine interaction.<n>This paper introduces a new research task--$textbfD$ialogue $textbfE$lement $textbfMO$deling.<n>We propose a novel benchmark, $textbfDEMO$, designed for a comprehensive dialogue modeling and assessment.
arXiv Detail & Related papers (2024-12-06T10:01:38Z) - Empathy Through Multimodality in Conversational Interfaces [1.360649555639909]
Conversational Health Agents (CHAs) are redefining healthcare by offering nuanced support that transcends textual analysis to incorporate emotional intelligence.
This paper introduces an LLM-based CHA engineered for rich, multimodal dialogue-especially in the realm of mental health support.
It adeptly interprets and responds to users' emotional states by analyzing multimodal cues, thus delivering contextually aware and empathetically resonant verbal responses.
arXiv Detail & Related papers (2024-05-08T02:48:29Z) - AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features.
Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the Acc-7 and w-F1 metrics.
arXiv Detail & Related papers (2024-04-12T11:31:18Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training
Encoder [19.51263716065853]
We propose a novel contextual dialogue encoder (i.e. DialogueBERT) based on the popular pre-trained language model BERT.
Five self-supervised learning pre-training tasks are devised for learning the particularity of dialouge utterances.
DialogueBERT was pre-trained with 70 million dialogues in real scenario, and then fine-tuned in three different downstream dialogue understanding tasks.
arXiv Detail & Related papers (2021-09-22T01:41:28Z) - Emotion Dynamics Modeling via BERT [7.3785751096660555]
We develop a series of BERT-based models to capture the inter-interlocutor and intra-interlocutor dependencies of the conversational emotion dynamics.
Our proposed models can attain around 5% and 10% improvement over the state-of-the-art baselines, respectively.
arXiv Detail & Related papers (2021-04-15T05:58:48Z) - Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue
Representation Learning [50.5572111079898]
Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc.
While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive.
In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks.
arXiv Detail & Related papers (2020-02-27T04:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.