Fugu-MT 論文翻訳(概要): WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

論文の概要: WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

arxiv url: http://arxiv.org/abs/2505.09558v1
Date: Wed, 14 May 2025 16:54:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-15 21:44:09.537044
Title: WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Title（参考訳）: WavReward:ジェネラリストによる対話モデル
Authors: Shengpeng Ji, Tianle Liang, Yangzhuo Li, Jialong Zuo, Minghui Fang, Jinzheng He, Yifu Chen, Zhengqing Liu, Ziyue Jiang, Xize Cheng, Siqi Zheng, Jin Xu, Junyang Lin, Zhou Zhao,
Abstract要約: WavRewardは、音声入力による音声対話システムのIQとEQを評価することができる報酬フィードバックモデルである。 ChatReward-30Kは、WavRewardのトレーニングに使用される好みのデータセットである。 WavRewardは、複数の音声対話シナリオで過去の最先端評価モデルより優れている。
参考スコア（独自算出の注目度）: 57.80264359636158
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end spoken dialogue models such as GPT-4o-audio have recently garnered significant attention in the speech domain. However, the evaluation of spoken dialogue models' conversational performance has largely been overlooked. This is primarily due to the intelligent chatbots convey a wealth of non-textual information which cannot be easily measured using text-based language models like ChatGPT. To address this gap, we propose WavReward, a reward feedback model based on audio language models that can evaluate both the IQ and EQ of spoken dialogue systems with speech input. Specifically, 1) based on audio language models, WavReward incorporates the deep reasoning process and the nonlinear reward mechanism for post-training. By utilizing multi-sample feedback via the reinforcement learning algorithm, we construct a specialized evaluator tailored to spoken dialogue models. 2) We introduce ChatReward-30K, a preference dataset used to train WavReward. ChatReward-30K includes both comprehension and generation aspects of spoken dialogue models. These scenarios span various tasks, such as text-based chats, nine acoustic attributes of instruction chats, and implicit chats. WavReward outperforms previous state-of-the-art evaluation models across multiple spoken dialogue scenarios, achieving a substantial improvement about Qwen2.5-Omni in objective accuracy from 55.1$\%$ to 91.5$\%$. In subjective A/B testing, WavReward also leads by a margin of 83$\%$. Comprehensive ablation studies confirm the necessity of each component of WavReward. All data and code will be publicly at https://github.com/jishengpeng/WavReward after the paper is accepted.
Abstract（参考訳）: GPT-4o-audioのようなエンドツーエンドの音声対話モデルは、最近、音声領域において大きな注目を集めている。しかし,音声対話モデルの対話性能の評価は概ね見過ごされている。これは、チャットボットが、ChatGPTのようなテキストベースの言語モデルでは容易に測定できない、豊富な非テキスト情報を伝達しているためである。このギャップに対処するために、音声入力による音声対話システムのIQとEQの両方を評価することができる音声言語モデルに基づく報酬フィードバックモデルWavRewardを提案する。具体的には 1) 音声言語モデルに基づいて,WavReward は深部推論と非線形報酬機構を取り入れた。強化学習アルゴリズムによるマルチサンプルフィードバックを利用して、音声対話モデルに適した特殊評価器を構築する。 2) We introduced ChatReward-30K, a preference dataset used to training WavReward。 ChatReward-30Kは音声対話モデルの理解と生成の両方の側面を含む。これらのシナリオは、テキストベースのチャット、9つの命令チャットの音響特性、暗黙のチャットなど、さまざまなタスクにまたがる。 WavRewardは、複数の音声対話シナリオで過去の最先端評価モデルより優れており、目標精度が55.1$\%$から91.5$\%$に大幅に改善されている。主観的なA/Bテストでは、WavRewardは83$\%のマージンを持つ。包括的アブレーション研究により、WavRewardの各成分の必要性が確認された。すべてのデータとコードは、論文が受理された後、https://github.com/jishengpeng/WavRewardで公開される。

論文の概要: WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

関連論文リスト