Fugu-MT 論文翻訳(概要): Why Alignment Must Precede Distillation: A Minimal Working Explanation

論文の概要: Why Alignment Must Precede Distillation: A Minimal Working Explanation

arxiv url: http://arxiv.org/abs/2509.23667v1
Date: Sun, 28 Sep 2025 06:12:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.362169
Title: Why Alignment Must Precede Distillation: A Minimal Working Explanation
Title（参考訳）: なぜアライメントがプレセド蒸留に必要か:最小限の作業説明
Authors: Sungmin Cha, Kyunghyun Cho,
Abstract要約: 標準のKD -> Alignワークフローは、稀だが望ましい振る舞いを整列するためにモデルの能力を低下させる。蒸留に先立って,まず高リコール基準でアライメントを行わなければならないことを示す。
参考スコア（独自算出の注目度）: 50.784080714897776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For efficiency, preference alignment is often performed on compact, knowledge-distilled (KD) models. We argue this common practice introduces a significant limitation by overlooking a key property of the alignment's reference model: its distributional recall. We show that the standard KD -> Align workflow diminishes the model's capacity to align rare yet desirable behaviors, even under strong preference signals. We instead demonstrate that reversing the pipeline (i.e., Align -> KD) is essential: alignment must first be performed on a high-recall reference before distillation. Our contributions are threefold. First, we provide a minimal working explanation of how the reference model constrains preference alignment objectives at a fundamental level. Second, we validate this theory in a controllable Mixture-of-Gaussians experiment, where low-recall anchoring consistently results in suboptimal model performance. Finally, we demonstrate that the same phenomenon holds in LLM alignment with the SmolLM2 family: models aligned after KD fail to effectively align target behaviors, resulting in substantially lower reward and target precision. In contrast, our proposed Align -> KD pipeline robustly aligns these behaviors, yielding models with superior target-oriented metrics and lower variance. Together, these results establish reference-model recall as a first-order design choice in alignment, offering a clear principle: alignment must precede distillation.
Abstract（参考訳）: 効率性のために、好みのアライメントは、しばしばコンパクトで知識蒸留(KD)モデルで実行される。この慣習は、アライメントの参照モデルの重要な性質、すなわち分布的リコールを見渡すことによって、大きな制限をもたらすと我々は主張する。 KD-> 標準ワークフローは、強い嗜好信号の下でも、希少かつ望ましい振る舞いを整列するモデルの能力を低下させることを示す。代わりに、パイプラインの反転(すなわち Align -> KD)が不可欠であることを示す。私たちの貢献は3倍です。まず、参照モデルがどのように優先順位付け目的を基本レベルで制約するかについて、最小限の作業説明を提供する。第2に、この理論を制御可能な混合ガウス実験で検証し、低リコールアンカリングが常に準最適モデル性能をもたらすことを示した。最後に、この現象がSmolLM2ファミリーとLLMのアライメントに関係していることを示し、KDが目標動作を効果的に整列できなかった後にモデルが整列すると、報酬と目標精度が大幅に低下することを示した。対照的に、提案したAlign -> KDパイプラインは、これらの挙動を頑健に調整し、より優れた目標指向メトリクスと低い分散度を持つモデルを生成する。これらの結果は、アライメントにおける一階設計選択として参照モデルリコールを確立し、明確な原則を提供する:アライメントは蒸留に先立って行う必要がある。

論文の概要: Why Alignment Must Precede Distillation: A Minimal Working Explanation

関連論文リスト