Fugu-MT 論文翻訳(概要): ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation

論文の概要: ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation

arxiv url: http://arxiv.org/abs/2606.05718v1
Date: Thu, 04 Jun 2026 05:18:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.570032
Title: ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation
Title（参考訳）: ViCuR:マルチモーダル・オン・ポリシック蒸留のためのヴィジュアルキューブ
Authors: Kanghui Tian, Siyuan Liu, Ziang Yan, Sheng Xia, Shuai Dong, Yi Wang,
Abstract要約: ViCuRは視覚的に接地された特権型蒸留フレームワークである。答え側の特権を視覚的手がかりに置き換える。答えに基づく自己蒸留よりも継続的に改善される。
参考スコア（独自算出の注目度）: 5.131753844398592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: On-policy distillation (OPD) improves reasoning by training a student on trajectories sampled from its own policy under supervision from a teacher. In multimodal reasoning, a common extension is to use a privileged teacher that observes training-time-only signals such as reference answers or rationales. However, such answer-side privilege creates a train-test mismatch: the teacher's supervision may depend on signals unavailable to the student, encouraging shortcut imitation rather than visually grounded reasoning. We propose ViCuR, a visually grounded privileged-teacher distillation framework that replaces answer-side privilege with visual cues (query-related evidence in the input). Because these cues are derived from the same visual input available at inference, their evidence is recoverable by the student. To support this, ViCuR introduces a lightweight cue recovery module that uses dedicated sink-token cross-attention during prefill to aggregate task-relevant visual evidence into an internal representation, without changing the inference interface or requiring auxiliary cue-generation losses. Across seven benchmarks with Qwen3-VL-2B and 8B students, ViCuR consistently improves over answer-based on-policy self-distillation by +1.19 and +1.24 on overall average performance. It also extends naturally to stronger-teacher OPD, surpassing OPD baselines by +0.64 and +1.08, with consistent out-of-domain gains at the 8B scale. These results show that, in multimodal on-policy distillation, the design of teacher privilege is as important as teacher strength.
Abstract（参考訳）: オンライン蒸留(On-policy distillation,OPD)は、教師の監督のもと、生徒が自身の方針からサンプリングした軌跡を学習することで推論を改善する。マルチモーダル推論において、一般的な拡張は、参照回答や有理数などの訓練時間のみの信号を観察する特権教師を使用することである。教師の監督は学生が利用できない信号に依存し、視覚的に根拠づけられた推論よりもショートカットの模倣を奨励する。 ViCuRは視覚的に接地された特権的蒸留フレームワークであり,視覚的手がかり(入力のクエリ関連証拠)に答え側の特権を置き換えるものである。これらの手がかりは推論時に利用できるのと同じ視覚的入力から導かれるため、学生によってその証拠は回収可能である。これをサポートするために、ViCuRは、タスク関連視覚的証拠を内部表現に集約するために、プレフィル中に専用のシンクトーケン・クロスアテンションを使用する軽量キューリカバリモジュールを導入した。 Qwen3-VL-2B と 8B の学生による7つのベンチマークで、ViCuR は平均性能において +1.19 と +1.24 の回答ベースの自己蒸留よりも一貫して改善されている。また、OPDのベースラインを+0.64と+1.08で上回り、ドメイン外ゲインを8Bスケールで維持する。これらの結果から, マルチモーダル蒸留法において, 教師の特権設計は教師の力と同じくらい重要であることが示唆された。

論文の概要: ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation

関連論文リスト