Fugu-MT 論文翻訳(概要): RL-Guided Data Selection for Language Model Finetuning

論文の概要: RL-Guided Data Selection for Language Model Finetuning

arxiv url: http://arxiv.org/abs/2509.25850v1
Date: Tue, 30 Sep 2025 06:42:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:45:00.04623
Title: RL-Guided Data Selection for Language Model Finetuning
Title（参考訳）: 言語モデルファインタニングのためのRL-Guided Data Selection
Authors: Animesh Jha, Harshit Gupta, Ananjan Nandi,
Abstract要約: 本稿では,多種多様な強化学習(RL)手法を用いて,最適データ選択ポリシーを学習するためのトラクタブルマルコフ決定プロセス(MDP)と訓練エージェントを提案する。 4つのデータセットにまたがって、アプローチが選択した5%$サブセットのトレーニングは、データセット全体の微調整を最大10.8$の精度ポイントで上回る。
参考スコア（独自算出の注目度）: 3.477926761611361
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data selection for finetuning Large Language Models (LLMs) can be framed as a budget-constrained optimization problem: maximizing a model's downstream performance under a strict training data budget. Solving this problem is generally intractable, and existing approximate approaches are pretraining-oriented and transfer poorly to the fine-tuning setting. We reformulate this problem as a tractable Markov Decision Process (MDP) and train agents using various Reinforcement Learning (RL) methods to learn optimal data selection policies, guided by an efficient, proxy-model-based reward signal. Across four datasets, training on a $5\%$ subset selected by our approach matches or outperforms fine-tuning on the full dataset by up to $10.8$ accuracy points, while cutting wall-clock training time by up to $2 \times$, highlighting the promise of RL-guided data selection.
Abstract（参考訳）: 大規模言語モデル(LLM)の微調整のためのデータ選択は、予算制約のある最適化問題として、厳格なトレーニングデータ予算の下で、モデル下流のパフォーマンスを最大化する。この問題の解法は一般に難解であり、既存の近似的アプローチは事前学習指向であり、微調整設定への伝達が不十分である。本稿では,この問題をMDP (Ttractable Markov Decision Process) と,RL (Reinforcement Learning) 手法を用いて,効率的なプロキシモデルに基づく報酬信号によって導かれる最適なデータ選択ポリシーを学習する列車エージェントとして再構成する。 4つのデータセットにわたって、我々のアプローチによって選択された5\%のサブセットでのトレーニングは、最大10.8$の精度ポイントでデータセットの微調整を実行し、ウォールクロックのトレーニング時間を最大2 \times$に削減し、RL誘導データ選択の約束を強調する。

論文の概要: RL-Guided Data Selection for Language Model Finetuning

関連論文リスト