Fugu-MT 論文翻訳(概要): DEFT: Distribution-guided Efficient Fine-Tuning for Human Alignment

論文の概要: DEFT: Distribution-guided Efficient Fine-Tuning for Human Alignment

arxiv url: http://arxiv.org/abs/2604.01787v1
Date: Thu, 02 Apr 2026 08:55:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.626186
Title: DEFT: Distribution-guided Efficient Fine-Tuning for Human Alignment
Title（参考訳）: DEFT:人間のアライメントのための配電誘導高能率ファインチューニング
Authors: Liang Zhu, Feiteng Fang, Yuelin Bai, Longze Chen, Zhexiang Zhang, Minghuan Tan, Min Yang,
Abstract要約: 本稿では,データフィルタリングと分散誘導を組み合わせた効率的なアライメントフレームワークDEFTを提案する。実験の結果, DEFTにより強化された手法はアライメント能力と一般化能力の両方において,元の手法よりも優れていた。
参考スコア（独自算出の注目度）: 21.889327846803095
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement Learning from Human Feedback (RLHF), using algorithms like Proximal Policy Optimization (PPO), aligns Large Language Models (LLMs) with human values but is costly and unstable. Alternatives have been proposed to replace PPO or integrate Supervised Fine-Tuning (SFT) and contrastive learning for direct fine-tuning and value alignment. However, these methods still require voluminous data to learn preferences and may weaken the generalization ability of LLMs. To further enhance alignment efficiency and performance while mitigating the loss of generalization ability, this paper introduces Distribution-guided Efficient Fine-Tuning (DEFT), an efficient alignment framework incorporating data filtering and distributional guidance by calculating the differential distribution reward based on the output distribution of language model and the discrepancy distribution of preference data. A small yet high-quality subset is filtered from the raw data using a differential distribution reward, which is then incorporated into existing alignment methods to guide the model's output distribution. Experimental results demonstrate that the methods enhanced by DEFT outperform the original methods in both alignment capability and generalization ability, with significantly reduced training time.
Abstract（参考訳）: RLHF(Reinforcement Learning from Human Feedback)は、PPO(Proximal Policy Optimization)のようなアルゴリズムを使用して、人間の値とLLM(Large Language Models)を一致させるが、コストと不安定性がある。 PPOを置き換えるか、スーパービジョンファインチューニング(SFT)と直接微調整と値アライメントのためのコントラスト学習を統合するための代替案が提案されている。しかし、これらの手法は、好みを学習するためには、依然として揮発性データを必要としており、LLMの一般化能力は弱まる可能性がある。一般化能力の喪失を軽減しつつ、アライメント効率と性能をさらに向上するために、言語モデルの出力分布と嗜好データの不一致分布に基づいて、データフィルタリングと分散誘導を組み込んだ効率的なアライメントフレームワークDEFTを導入する。小さいが高品質なサブセットは差分分布報酬を用いて原データからフィルタリングされ、モデルの出力分布を導くために既存のアライメントメソッドに組み込まれる。実験結果から,DEFT法により強化された手法は,アライメント能力と一般化能力の両方において元の手法よりも優れ,トレーニング時間が大幅に短縮された。

論文の概要: DEFT: Distribution-guided Efficient Fine-Tuning for Human Alignment

関連論文リスト