Fugu-MT 論文翻訳(概要): Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models

論文の概要: Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models

arxiv url: http://arxiv.org/abs/2604.14920v1
Date: Thu, 16 Apr 2026 12:03:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 21:29:31.880765
Title: Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models
Title（参考訳）: 対話型音声対話モデルにおけるセマンティック・ターンテイクロバストネスに向けた2軸生成リワードモデル
Authors: Yifu Chen, Shengpeng Ji, Zhengqing Liu, Qian Chen, Wen Wang, Ziqing Wang, Yangzhuo Li, Tianle Liang, Zhou Zhao,
Abstract要約: 良く設計された報酬信号は強化学習(RL)に不可欠である本モデルは,多種多様なデータセットを対象としたインタラクション品質評価の最先端性能を実現する。
参考スコア（独自算出の注目度）: 45.119381322968735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Achieving seamless, human-like interaction remains a key challenge for full-duplex spoken dialogue models (SDMs). Reinforcement learning (RL) has substantially enhanced text- and vision-language models, while well-designed reward signals are crucial for the performance of RL. We consider RL a promising strategy to address the key challenge for SDMs. However, a fundamental barrier persists: prevailing automated metrics for assessing interaction quality rely on superficial proxies, such as behavioral statistics or timing-prediction accuracy, failing to provide reliable reward signals for RL. On the other hand, human evaluations, despite their richness, remain costly, inconsistent, and difficult to scale. We tackle this critical barrier by proposing a Dual-Axis Generative Reward Model, which is trained to understand complex interaction dynamics using a detailed taxonomy and an annotated dataset, produces a single score and, crucially, provides separate evaluations for semantic quality and interaction timing. Such dual outputs furnish precise diagnostic feedback for SDMs and deliver a dependable, instructive reward signal suitable for online reinforcement learning. Our model achieves state-of-the-art performance on interaction-quality assessment across a wide spectrum of datasets, spanning synthetic dialogues and complex real-world interactions.
Abstract（参考訳）: シームレスな人間のような対話を実現することは、完全な二重音声対話モデル(SDM)にとって重要な課題である。強化学習 (Reinforcement Learning, RL) はテキスト言語と視覚言語モデルを大幅に強化し, 十分に設計された報酬信号はRLの性能向上に不可欠である。我々は、RLをSDMの鍵となる課題に取り組むための有望な戦略だと考えている。しかし、基本的な障壁は続く: 相互作用品質を評価するための自動化メトリクスは、行動統計やタイミング予測精度のような表面的プロキシに依存し、RLに対する信頼性の高い報酬信号を提供できない。一方、人間の評価は、その豊かさにもかかわらず、費用がかかり、一貫性がなく、スケールが難しい。我々は、詳細な分類法と注釈付きデータセットを用いて複雑な相互作用のダイナミクスを理解するために訓練されたDual-Axis Generative Reward Modelを提案し、単一のスコアを生成し、重要な点として、セマンティックな品質と相互作用のタイミングを個別に評価する。このような二重出力は、SDMの正確な診断フィードバックを与え、オンライン強化学習に適した信頼性の高いインストラクティブ報酬信号を提供する。本モデルは,多種多様なデータセットにまたがるインタラクション品質評価において,合成対話と複雑な実世界のインタラクションにまたがって,最先端のパフォーマンスを実現する。

論文の概要: Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models

関連論文リスト