Fugu-MT 論文翻訳(概要): Efficiently Aligning Language Models with Online Natural Language Feedback

論文の概要: Efficiently Aligning Language Models with Online Natural Language Feedback

arxiv url: http://arxiv.org/abs/2605.04356v1
Date: Tue, 05 May 2026 23:25:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-07 18:41:07.577542
Title: Efficiently Aligning Language Models with Online Natural Language Feedback
Title（参考訳）: オンライン自然言語フィードバックを用いた効率的な言語モデル作成
Authors: Christine Ye, Joe Benton,
Abstract要約: ファジィ領域の言語モデルをオンライン自然言語フィードバックを用いて整列させる手法を開発した。 In-context Learning (ICL) と fine-tuning を用いて,言語モデルから代用報酬モデルを構築する。この結果から,オンライン自然言語フィードバックは専門家の指導によるデータ効率を大幅に向上させる可能性が示唆された。
参考スコア（独自算出の注目度）: 2.821655149272041
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning with verifiable rewards has been used to elicit impressive performance from language models in many domains. But, broadly beneficial deployments of AI may require us to train models with strong capabilities in "fuzzy", hard-to-supervise domains. In this paper, we develop methods to align language models in fuzzy domains where human experts are still able to provide high-quality supervision signal, but only for a small number of model outputs, using online natural language feedback. Specifically, we train models by iteratively optimizing against proxy reward signals, stopping at the point of over-optimization, collecting fresh expert supervision, and updating the proxy reward. We construct proxy reward models from language models using in-context learning (ICL) and fine-tuning. We test our methods by eliciting creative writing and alignment research capabilities in Qwen3-8B and Haiku 4.5 respectively. For Qwen3-8B, ICL methods recover up to 35% of performance with 50x fewer expert samples, while fine-tuning methods recover 80% with up to 20x fewer samples and 100% with 3x fewer samples. For Haiku 4.5, ICL methods recover up to 35% of performance with 30x fewer samples, and fine-tuning methods recover 100% with 10x fewer samples. Our results suggest that online natural language feedback can substantially improve the data efficiency of expert supervision.
Abstract（参考訳）: 検証可能な報酬を伴う強化学習は、多くのドメインの言語モデルから印象的なパフォーマンスを引き出すために使われてきた。しかし、広く有用なAIのデプロイメントでは、"ファジィで監視の難しい"ドメインで強力な能力を持つモデルをトレーニングする必要があります。本稿では,人間の専門家が高品質な監視信号を提供することができるファジィ領域における言語モデルの整合性を,オンラインの自然言語フィードバックを用いて,少数のモデル出力に限定して開発する。具体的には、プロキシ報酬信号に対して反復的に最適化し、過度な最適化の時点で停止し、新たな専門家の監督を集め、プロキシ報酬を更新することでモデルをトレーニングする。 In-context Learning (ICL) と fine-tuning を用いて,言語モデルから代用報酬モデルを構築する。我々は, Qwen3-8B と Haiku 4.5 において, 創造的執筆能力とアライメント研究能力を付与し, 提案手法を検証した。 Qwen3-8Bでは、ICL法は50倍の専門サンプルで最大35%の性能を回復し、微調整法は最大で最大で20倍のサンプルで80%を回復し、3倍のサンプルで100%を回復した。 Haiku 4.5の場合、ICL法は30倍のサンプルで最大35%の性能を回復し、微調整法は10倍のサンプルで100%回復した。この結果から,オンライン自然言語フィードバックは専門家の指導によるデータ効率を大幅に向上させる可能性が示唆された。

論文の概要: Efficiently Aligning Language Models with Online Natural Language Feedback

関連論文リスト