Fugu-MT 論文翻訳(概要): EchoChain: A Full-Duplex Benchmark for State-Update Reasoning Under Interruptions

論文の概要: EchoChain: A Full-Duplex Benchmark for State-Update Reasoning Under Interruptions

arxiv url: http://arxiv.org/abs/2604.16456v1
Date: Wed, 08 Apr 2026 00:43:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 02:32:14.01492
Title: EchoChain: A Full-Duplex Benchmark for State-Update Reasoning Under Interruptions
Title（参考訳）: EchoChain: 中断中の状態更新推論のためのフル二重ベンチマーク
Authors: Smit Nautambhai Modi, Gandharv Mahajan, Marc Wetter, Randall Welles,
Abstract要約: 音声の中間中断下での完全中断後の推論を評価するための制御されたベンチマークとしてEchoChainを評価した。このベンチマークは、シナリオ駆動の会話を生成し、アシスタント音声のオンセットに対して標準化された点で割り込みを注入する。合格率は50%を超えず、中世代の州改正で大幅な改善が見られた。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-time voice assistants must revise task state when users interrupt mid-response, but existing spoken-dialog benchmarks largely evaluate turn-based interaction and miss this failure mode. We introduce EchoChain, a controlled benchmark for evaluating full-duplex state-update reasoning under mid-speech interruptions. EchoChain identifies three recurring failure patterns in post-interruption continuations: contextual inertia, interruption amnesia, and objective displacement. The benchmark generates scenario-driven conversations and injects interruptions at a standardized point relative to assistant speech onset, enabling controlled cross-model comparison. In a paired half-duplex control, total failures drop by 40.2% relative to interrupted runs, indicating that many errors are driven by state-update reasoning under interruption rather than task difficulty alone. Across evaluated real-time voice models, no system exceeds a 50% pass rate, showing substantial room for improvement in mid-generation state revision. EchoChain provides a reproducible benchmark for diagnosing state-update reasoning failures in full-duplex voice interaction.
Abstract（参考訳）: リアルタイム音声アシスタントは、ユーザーがミッドレスポンスを中断した場合にタスク状態を変更する必要があるが、既存の音声ダイアログベンチマークはターンベースのインタラクションを評価し、この障害モードを見逃す。音声中の中断下での完全二重状態更新推論を評価するための制御ベンチマークであるEchoChainを紹介する。 EchoChainは、中断後の継続で繰り返される3つの障害パターンを識別する。このベンチマークは、シナリオ駆動の会話を生成し、アシスタント音声のオンセットに対して標準化された点で割り込みを注入し、制御されたクロスモデル比較を可能にする。 2組の半二重制御では、総故障は中断された実行に対して40.2%減少し、多くのエラーはタスクの難易度だけでなく中断中の状態更新推論によって引き起こされることを示している。評価されたリアルタイム音声モデル全体では、システムは50%のパスレートを超えず、中世代の状態修正において実質的な改善の余地が示される。 EchoChainは、完全二重音声インタラクションにおける状態更新推論障害の診断のための再現可能なベンチマークを提供する。

論文の概要: EchoChain: A Full-Duplex Benchmark for State-Update Reasoning Under Interruptions

関連論文リスト