Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
- URL: http://arxiv.org/abs/2503.04721v1
- Date: Thu, 06 Mar 2025 18:59:16 GMT
- Title: Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities
- Authors: Guan-Ting Lin, Jiachen Lian, Tingle Li, Qirui Wang, Gopala Anumanchipalli, Alexander H. Liu, Hung-yi Lee,
- Abstract summary: FullDuplexBench is a benchmark that systematically evaluates key conversational behaviors.<n>We aim to advance spoken dialogue modeling and encourage the development of more interactive and natural dialogue systems.
- Score: 93.09944267871163
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Spoken dialogue modeling introduces unique challenges beyond text-based language modeling, demanding robust turn-taking, backchanneling, and real-time interaction. Although most Spoken Dialogue Models (SDMs) rely on half-duplex processing (handling speech one turn at a time), emerging full-duplex SDMs can listen and speak simultaneously, enabling more natural and engaging conversations. However, current evaluations of such models remain limited, often focusing on turn-based metrics or high-level corpus analyses (e.g., turn gaps, pauses). To address this gap, we present Full-Duplex-Bench, a new benchmark that systematically evaluates key conversational behaviors: pause handling, backchanneling, turn-taking, and interruption management. Our framework uses automatic metrics for consistent and reproducible assessments of SDMs' interactive performance. By offering an open and standardized evaluation benchmark, we aim to advance spoken dialogue modeling and encourage the development of more interactive and natural dialogue systems.
Related papers
- A Multi-view Discourse Framework for Integrating Semantic and Syntactic Features in Dialog Agents [0.0]
Multiturn dialogue models aim to generate human-like responses by leveraging conversational context.
Existing methods often neglect the interactions between these utterances or treat all of them as equally significant.
This paper introduces a discourse-aware framework for response selection in retrieval-based dialogue systems.
arXiv Detail & Related papers (2025-04-12T04:22:18Z) - WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain.
These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech.
Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z) - OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation [53.7173034249361]
End-to-end GPT-based model OmniFlatten capable of effectively modeling complex behaviors inherent natural conversations with low latency.<n>Our approach offers a simple modeling technique and a promising research direction for developing efficient and natural end-to-end full- spoken dialogue systems.
arXiv Detail & Related papers (2024-10-23T11:58:58Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - Back to the Future: Bidirectional Information Decoupling Network for
Multi-turn Dialogue Modeling [80.51094098799736]
We propose Bidirectional Information Decoupling Network (BiDeN) as a universal dialogue encoder.
BiDeN explicitly incorporates both the past and future contexts and can be generalized to a wide range of dialogue-related tasks.
Experimental results on datasets of different downstream tasks demonstrate the universality and effectiveness of our BiDeN.
arXiv Detail & Related papers (2022-04-18T03:51:46Z) - DynaEval: Unifying Turn and Dialogue Level Evaluation [60.66883575106898]
We propose DynaEval, a unified automatic evaluation framework.
It is capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue.
Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model.
arXiv Detail & Related papers (2021-06-02T12:23:18Z) - Neural Generation of Dialogue Response Timings [13.611050992168506]
We propose neural models that simulate the distributions of spoken response offsets.
The models are designed to be integrated into the pipeline of an incremental spoken dialogue system.
We show that human listeners consider certain response timings to be more natural based on the dialogue context.
arXiv Detail & Related papers (2020-05-18T23:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.