Related papers: Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

URL: http://arxiv.org/abs/2510.09592v1
Date: Fri, 10 Oct 2025 17:50:59 GMT
Title: Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models
Authors: Donghang Wu, Haoyang Zhang, Jun Chen, Xiangyu, Zhang, Hexin Liu, Eng Siong Chng, Fei Tian, Xuerui Yang, Xiangyu Zhang, Daxin Jiang, Gang Yu,
Abstract summary: We present Mind-Paced Speaking (MPS), a brain-inspired framework that enables high-fidelity, real-time reasoning.<n>MPS employs a "Formulation Brain" for high-level reasoning to pace and guide a separate "Articulation Brain" for fluent speech generation.
Score: 81.9612057950385
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-time Spoken Language Models (SLMs) struggle to leverage Chain-of-Thought (CoT) reasoning due to the prohibitive latency of generating the entire thought process sequentially. Enabling SLMs to think while speaking, similar to humans, is attracting increasing attention. We present, for the first time, Mind-Paced Speaking (MPS), a brain-inspired framework that enables high-fidelity, real-time reasoning. Similar to how humans utilize distinct brain regions for thinking and responding, we propose a novel dual-brain approach, employing a "Formulation Brain" for high-level reasoning to pace and guide a separate "Articulation Brain" for fluent speech generation. This division of labor eliminates mode-switching, preserving the integrity of the reasoning process. Experiments show that MPS significantly outperforms existing think-while-speaking methods and achieves reasoning performance comparable to models that pre-compute the full CoT before speaking, while drastically reducing latency. Under a zero-latency configuration, the proposed method achieves an accuracy of 92.8% on the mathematical reasoning task Spoken-MQA and attains a score of 82.5 on the speech conversation task URO-Bench. Our work effectively bridges the gap between high-quality reasoning and real-time interaction.

Related papers

To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks [56.11584171938381]
Theory of Mind (ToM) assesses whether models can infer hidden mental states such as beliefs, desires, and intentions.<n>Recent progress in Large Reasoning Models (LRMs) has boosted step-by-step inference in mathematics and coding.<n>We present a systematic study of nine advanced Large Language Models (LLMs) comparing reasoning models with non-reasoning models.
arXiv Detail & Related papers (2026-02-11T08:16:13Z)
Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning [31.790359663851305]
genuine affective intelligence requires explicit modeling of Theory of Mind (ToM), the cognitive substrate from which emotions arise.<n>We introduce HitEmotion, a ToM-grounded hierarchical benchmark that diagnoses capability breakpoints across increasing levels of cognitive depth.<n>Second, we propose a ToM-guided reasoning chain that tracks mental states and calibrates cross-modal evidence to achieve faithful emotional reasoning.
arXiv Detail & Related papers (2026-02-01T02:26:12Z)
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models [66.84843878538207]
Chronological Thinking aims to improve response quality in full SDLMs.<n>No additional latency: once the user stops speaking, the agent halts thinking and begins speaking without further delay.<n>Results: Experiments demonstrate the effectiveness of chronological thinking through both objective metrics and human evaluations.
arXiv Detail & Related papers (2025-10-02T10:28:11Z)
Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech [41.625380059502675]
Think-Verbalize-Speak is a framework that decouples reasoning from spoken delivery.<n>We also introduce ReVerT, a latency-efficient verbalizer based on incremental and asynchronous summarization.<n> Experiments across multiple benchmarks show that our method enhances speech naturalness and conciseness with minimal impact on reasoning.
arXiv Detail & Related papers (2025-09-19T14:34:22Z)
Mini-Omni-Reasoner: Token-Level Thinking-in-Speaking in Large Speech Models [80.75260664100644]
Mini-Omni-Reasoner is a framework that enables reasoning within speech via a novel "Thinking-in-Speaking" formulation.<n>It interleaves silent reasoning tokens with spoken response tokens at the token level.<n>It achieves a +19.1% gain in arithmetic reasoning and +6.4% in contextual understanding, with shorter outputs and zero decoding latency.
arXiv Detail & Related papers (2025-08-18T15:14:04Z)
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models [131.90117151306993]
Spoken Language Models (SLMs) are designed to take speech inputs and produce spoken responses.<n>Current SLMs lack the ability to perform an internal, unspoken thinking process before responding.<n>We propose Stitch, a novel generation method that alternates between the generation of unspoken reasoning chunks and spoken response chunks.
arXiv Detail & Related papers (2025-07-21T08:30:03Z)
On the Thinking-Language Modeling Gap in Large Language Models [68.83670974539108]
We show that there is a significant gap between the modeling of languages and thoughts.<n>We propose a new prompt technique termed Language-of-Thoughts (LoT) to demonstrate and alleviate this gap.
arXiv Detail & Related papers (2025-05-19T09:31:52Z)
Improving Semantic Understanding in Speech Language Models via Brain-tuning [19.732593005537606]
Speech language models align with human brain responses to natural language to an impressive degree.<n>Current models rely heavily on low-level speech features, indicating they lack brain-relevant semantics.<n>We address this limitation by inducing brain-relevant bias directly into the models via fine-tuning with fMRI recordings.
arXiv Detail & Related papers (2024-10-11T20:06:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.