Fugu-MT 論文翻訳(概要): AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

論文の概要: AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

arxiv url: http://arxiv.org/abs/2605.20643v1
Date: Wed, 20 May 2026 03:06:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.450443
Title: AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals
Title（参考訳）: AVSD:コンセンサスと教師専用信号のバランスによる適応視点自己拡張
Authors: Duy Nguyen, Hanqi Xiao, Archiki Prasad, Zaid Khan, Anirban Das, Austin Zhang, Sambit Sahu, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal,
Abstract要約: 自己蒸留は、生徒と教師の両方と同じモデルを用いて、言語モデルが自身の軌道から政治学を学ぶことを可能にする。このセットアップは、別の外部モデルに頼ることなく、密集したトークンレベルのフィードバックを提供する。複数の特権情報ビューを持つ自己蒸留法であるAVSDを紹介する。
参考スコア（独自算出の注目度）: 65.71225905458182
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-distillation enables language models to learn on-policy from their own trajectories by using the same model as both student and teacher, with the teacher being conditioned on privileged information unavailable to the student. Such information can come in different types or views, such as solutions, demonstrations, feedback, or final answers. This setup provides dense token-level feedback without relying on a separate external model, but creates a fundamental asymmetry: the teacher may rely on view-specific information that the student cannot access at inference time. Moreover, the best type of privileged information is often task-dependent, making it difficult to choose a single teacher view. In this work, we address both these challenges jointly by introducing AVSD (Adaptive-View Self-Distillation), a novel method of self-distillation with multiple privileged-information views, which reconstructs token-level supervision by separating stable cross-view consensus from view-specific residual signals. AVSD identifies the consensus signal shared across views, which provides a reliable update direction, and then selectively adds the view-specific residual signal to adjust the update magnitude when it both aligns with the consensus direction and remains proportionate to the consensus signal. Experiments on math competition benchmarks (AIME24, AIME25, and HMMT25) show that AVSD consistently outperforms both single-view self-distillation baselines and GRPO, achieving average Avg@8 gains of 3.1% and 2.2% over the strongest baselines on Qwen3-8B and Qwen3-4B, respectively. Moreover, on code-generation benchmarks (Codeforces, LiveCodeBench v6) using Qwen3-8B, AVSD outperforms the single-view self-distillation baseline by 2.4% on average.
Abstract（参考訳）: 自己蒸留により、教師は生徒が利用できない特権情報に基づいて条件付きで、生徒と教師の双方と同じモデルを用いて、言語モデルが独自の軌道から政治学を学ぶことができる。このような情報は、ソリューションやデモ、フィードバック、最終的な回答など、さまざまなタイプやビューで得られる。このセットアップは、別の外部モデルに頼ることなく、密集したトークンレベルのフィードバックを提供するが、基本的な非対称性を生み出す。さらに、最高の特権情報は、しばしばタスク依存であり、単一の教師の視点を選択するのが難しくなる。本研究では,複数の特権情報ビューを用いた自己蒸留手法であるAVSD(Adaptive-View Self-Distillation)を導入することにより,両課題を共同で解決する。 AVSDは、ビュー間で共有されるコンセンサス信号を識別し、信頼性の高い更新方向を提供し、その後、コンセンサス方向と整合し、コンセンサス信号に比例するときに更新大きさを調整するために、ビュー固有の残留信号を選択的に追加する。数学コンペティションベンチマーク(AIME24、AIME25、HMMT25)の実験では、AVSDはシングルビューの自己蒸留ベースラインとGRPOの両方を一貫して上回り、それぞれQwen3-8BとQwen3-4Bで最強のベースラインに対して平均3.1%と2.2%のAvg@8ゲインを達成した。さらに、Qwen3-8Bを使ったコード生成ベンチマーク(Codeforces、LiveCodeBench v6)では、AVSDはシングルビューの自己蒸留ベースラインを平均2.4%上回っている。

論文の概要: AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

関連論文リスト