Fugu-MT 論文翻訳(概要): Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

論文の概要: Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

arxiv url: http://arxiv.org/abs/2605.20946v1
Date: Wed, 20 May 2026 09:32:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.599925
Title: Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation
Title（参考訳）: 時折考える:リアルタイム音声生成のための制御されたインターリーブ推論手法
Authors: Xuan Du, Qiangyu Yan, Wenshuo Li, Borui Jiang, Changming Xiao, Han Shu, Xinghao Chen,
Abstract要約: 重要な課題は、深い推論を実行しながら、流動的なスピーチを維持することです。提案手法であるInterRSは,自然言語生成時にのみ推論ステップを挿入することでこの問題に対処する。シームレスにインターリーブされた音声データを生成する新しいパイプラインを導入する。
参考スコア（独自算出の注目度）: 15.297424620191158
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The thinking-while-speaking paradigm aims to make AI communication more human. A key challenge is maintaining fluent speech while performing deep reasoning. Our method, InterRS, tackles this by inserting reasoning steps only during natural speech generation. This requires high-quality data where reasoning and speech are precisely aligned, and the length ratio are under controlled. We introduce a novel pipeline to generate such seamlessly interleaved audio data. To train our model, we combine interleaved SFT with refined data and reinforcement learning with two new rewards: a TA-Balance Reward to manage timing and thinking-answer ratio, and a Linguistic Quality Reward to refine expression. Experiments show our approach achieves 13% better performance on mathmatical and logic benchmarks while generating instant response like a spoken-language instruct model which outputs fast CoT response. Furthermore, our method generates more natural and fluent answers than prior methods.
Abstract（参考訳）: 考えることの多いパラダイムは、AIコミュニケーションをより人間らしくすることを目的としています。重要な課題は、深い推論を実行しながら、流動的なスピーチを維持することです。提案手法であるInterRSは,自然言語生成時にのみ推論ステップを挿入することでこの問題に対処する。これは、推論と音声が正確に一致し、長さ比が制御されていない高品質なデータを必要とする。このようなシームレスにインターリーブされた音声データを生成するための新しいパイプラインを導入する。本モデルの学習には, TA-Balance Reward とLinguistic Quality Reward の2つの新たな報奨, TA-Balance Reward の2つの改良されたデータと強化学習を組み合わせる。実験により,提案手法は,高速なCoT応答を出力する音声インストラクションモデルのような即時応答を生成しながら,数学的および論理的ベンチマークにおいて13%の性能向上を実現していることが示された。さらに,本手法は従来手法よりも自然で流動的な解を生成する。

論文の概要: Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

関連論文リスト