Fugu-MT 論文翻訳(概要): Enhancing Speech Large Language Models through Reinforced Behavior Alignment

論文の概要: Enhancing Speech Large Language Models through Reinforced Behavior Alignment

arxiv url: http://arxiv.org/abs/2509.03526v1
Date: Mon, 25 Aug 2025 07:31:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-07 09:10:15.291
Title: Enhancing Speech Large Language Models through Reinforced Behavior Alignment
Title（参考訳）: 強化行動アライメントによる音声大言語モデルの強化
Authors: Yansong Liu, Jiateng Li, Yuan Liu,
Abstract要約: 本稿では,言語生成能力を高めるためのRBA(Reinforced Behavior Alignment)というフレームワークを提案する。 RBAは、人間のアノテーションから教師付き微調整に頼るのではなく、自己合成手法を用いて、広範囲で高忠実なアライメントデータを生成する。実験により,本手法はSpeechLMの指示追従能力を効果的に向上することを示した。
参考スコア（独自算出の注目度）: 5.647822820528311
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recent advancements of Large Language Models (LLMs) have spurred considerable research interest in extending their linguistic capabilities beyond text to other modalities, which leads to emergence of speech-based LLMs (SpeechLMs) with capability of processing user request in either speech or textual formats. However, owing to inter-modal discrepancies, these SpeechLMs still exhibit a significant performance gap compared to their text-based LLM counterparts in instruction-following, particularly when confronted with the dynamic and variable nature of user speech. To address this challenge, this paper introduces a framework termed Reinforced Behavior Alignment (RBA), designed to bolster the language generation proficiency of SpeechLMs. Instead of relying on supervised fine-tuning from human annotations, RBA employs a self-synthesis methodology to generate extensive, high-fidelity alignment data by a powerful teacher LLM. Then SpeechLMs is aligned its behavior with that of a teacher using a reinforcement learning-based approach. Experimental results demonstrate that this method effectively enhances the instruction-following capabilities of SpeechLMs that outperform conventional distillation baselines. Crucially, we demonstrate that RBA can be seamlessly extended to tasks such including spoken question answering and speech-to-text translation, attaining state-of-the-art performance on open benchmarks with only self-generated data.
Abstract（参考訳）: 近年のLLM(Large Language Models)の進歩は、言語機能をテキストを超えて他のモダリティに拡張することに対する研究の関心を喚起し、音声やテキストのフォーマットでユーザ要求を処理する能力を持つ音声ベースのLLM(SpeechLMs)の出現につながった。しかしながら、モーダル間不一致のため、これらのSpeechLMは、テキストベースのLLMと比較して、命令追従において、特にユーザ音声の動的・可変性に直面する場合、大きな性能差をみせている。本稿では,言語生成能力の向上を目的としたRBA(Reinforced Behavior Alignment)というフレームワークを提案する。 RBAは、人間のアノテーションから教師付き微調整に頼るのではなく、自己合成手法を用いて強力な教師LLMによる広範囲で高忠実なアライメントデータを生成する。次に、SpeechLMsは、強化学習に基づくアプローチを用いて、教師の行動と一致させる。実験により, 従来の蒸留ベースラインよりも優れたSpeechLMの指示追従性能を効果的に向上することが確認された。重要なことは、RABが音声質問応答や音声からテキストへの翻訳などのタスクにシームレスに拡張できることを示し、自己生成データのみを用いたオープンベンチマークで最先端のパフォーマンスを実現する。

論文の概要: Enhancing Speech Large Language Models through Reinforced Behavior Alignment

関連論文リスト