Fugu-MT 論文翻訳(概要): Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

論文の概要: Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

arxiv url: http://arxiv.org/abs/2605.20356v1
Date: Tue, 19 May 2026 18:11:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.31195
Title: Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models
Title（参考訳）: 全二重音声対話モデルにおける同期とターンタイキング
Authors: Pablo Riera, Pablo Brusco, Cristina Kuo, Marcelo Sancinetti, S. R. K. Branavan,
Abstract要約: 完全な音声対話モデルは、ターンベースシステムよりも人間の会話に近い声を同時に話すことができる。本研究では,人間のコミュニケーションにおいて,ニューラルカップリングを用いた内部相互作用を協調するモデルを提案する。雑音のない条件下では強い表現同期が得られ、ラグはゼロに近づき、ノイズは劣化する。
参考スコア（独自算出の注目度）: 3.5946669116828134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Full-duplex spoken dialogue models (SDMs) can listen and speak simultaneously, enabling interaction dynamics closer to human conversation than turn-based systems. Inspired by neural coupling in human communication, we study how such models coordinate their internal representations during interaction. We simulate full-duplex dialogues between two instances of the pretrained \textit{Moshi} model under controlled conditions, manipulating channel noise and decoding bias. Synchronization is measured using Centered Kernel Alignment (CKA) across temporal lags, while anticipatory turn-taking cues are probed from delayed internal activations using causal LSTM models, from both speaker and listener perspectives. We find strong representational synchronization under no noise conditions, peaking near zero lag and degrading with noise, and we show that internal states encode anticipatory information that supports turn-taking prediction ahead of time.
Abstract（参考訳）: 全二重音声対話モデル(SDM)は、ターンベースシステムよりも人間の会話に近いインタラクションダイナミクスを可能にする。人間のコミュニケーションにおけるニューラルカップリングに触発されて、そのようなモデルが相互作用中の内部表現をどのように調整するかを考察する。制御条件下での事前学習された \textit{Moshi} モデルの2つのインスタンス間の全二重対話をシミュレートし、チャネルノイズと復号バイアスを操作する。シンクロナイゼーションはCKA(Central Kernel Alignment)を用いて時間ラグを横断して測定する一方、話者とリスナーの両方の観点から、因果LSTMモデルを用いて遅延内部の活性化から予測的なターンテイク手がかりを探索する。ノイズ条件のない強い表現同期,ゼロラグ付近のピーク,ノイズによる劣化などを見いだし,事前のターンテイク予測を支援する予測情報を内部状態にエンコードしていることを示す。

論文の概要: Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

関連論文リスト