Fugu-MT 論文翻訳(概要): Chronological Thinking in Full-Duplex Spoken Dialogue Language Models

論文の概要: Chronological Thinking in Full-Duplex Spoken Dialogue Language Models

arxiv url: http://arxiv.org/abs/2510.05150v2
Date: Wed, 08 Oct 2025 21:35:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 12:56:53.558888
Title: Chronological Thinking in Full-Duplex Spoken Dialogue Language Models
Title（参考訳）: 全二重音声対話言語モデルにおける時系列思考
Authors: Donghang Wu, Haoyang Zhang, Chen Chen, Tianyu Zhang, Fei Tian, Xuerui Yang, Gang Yu, Hexin Liu, Nana Hou, Yuchen Hu, Eng Siong Chng,
Abstract要約: 時系列思考は、完全なSDLMの応答品質を改善することを目的としている。追加のレイテンシがない: ユーザが話すのをやめると、エージェントは考えるのをやめ、それ以上の遅延なしに話し始める。結果: 客観的指標と人的評価の両面から, 時系列思考の有効性を示す実験を行った。
参考スコア（独自算出の注目度）: 66.84843878538207
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in spoken dialogue language models (SDLMs) reflect growing interest in shifting from turn-based to full-duplex systems, where the models continuously perceive user speech streams while generating responses. This simultaneous listening and speaking design enables real-time interaction and the agent can handle dynamic conversational behaviors like user barge-in. However, during the listening phase, existing systems keep the agent idle by repeatedly predicting the silence token, which departs from human behavior: we usually engage in lightweight thinking during conversation rather than remaining absent-minded. Inspired by this, we propose Chronological Thinking, a on-the-fly conversational thinking mechanism that aims to improve response quality in full-duplex SDLMs. Specifically, chronological thinking presents a paradigm shift from conventional LLM thinking approaches, such as Chain-of-Thought, purpose-built for streaming acoustic input. (1) Strictly causal: the agent reasons incrementally while listening, updating internal hypotheses only from past audio with no lookahead. (2) No additional latency: reasoning is amortized during the listening window; once the user stops speaking, the agent halts thinking and begins speaking without further delay. Experiments demonstrate the effectiveness of chronological thinking through both objective metrics and human evaluations show consistent improvements in response quality. Furthermore, chronological thinking robustly handles conversational dynamics and attains competitive performance on full-duplex interaction metrics.
Abstract（参考訳）: 音声対話言語モデル(SDLM)の最近の進歩は、ターンベースからフルダブルプレックスシステムへの移行に対する関心の高まりを反映している。この同時聴取および発話設計により、リアルタイムの対話が可能となり、エージェントはユーザーバージインのような動的な会話行動を処理することができる。しかしながら、リスニングフェーズの間、既存のシステムはサイレントトークンを繰り返し予測することでエージェントをアイドル状態に保ち、それは人間の行動から逸脱する。そこで本研究では,全二重SDLMの応答品質向上を目的とした,オンザフライ対話型思考機構であるChronoological Thinkingを提案する。特に、時系列思考は、ストリーミング音響入力のためのChain-of-Thoughtのような従来のLLM思考アプローチからパラダイムシフトを示す。 1) 厳密な因果関係: エージェントは聴取中に段階的に原因を定め, 過去の音声のみからのみ内部仮説を更新する。 2) 追加の遅延がない:リスニングウィンドウ中に推論が償却される; ユーザが話すのをやめると、エージェントは考えるのをやめ、それ以上の遅延なく話し始める。客観的指標と人的評価の両方による時系列思考の有効性を示す実験では、応答品質が一貫した改善が見られた。さらに、時間的思考は会話のダイナミクスをしっかりと処理し、完全な二重相互作用メトリクス上での競合性能を達成する。

論文の概要: Chronological Thinking in Full-Duplex Spoken Dialogue Language Models

関連論文リスト