Fugu-MT 論文翻訳(概要): DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

論文の概要: DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

arxiv url: http://arxiv.org/abs/2605.15532v2
Date: Tue, 19 May 2026 17:39:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:08.370063
Title: DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation
Title（参考訳）: DeltaPrompts:マルチモーダル蒸留におけるゼロデルタトラップの回避
Authors: Jaehun Jung, Hyunwoo Kim, Brandon Cui, Ximing Lu, David Acuna, Prithviraj Ammanabrolu, Yejin Choi,
Abstract要約: 蒸留により、コンパクトなビジョンランゲージモデル(VLM)が強力な推論能力を得ることができる。標準チャート/文書推論データセットにおけるプロンプトの最大69%は、事実上ゼロデルタである。既存のデータセットをシードとして再利用し、学生の障害モードを積極的にターゲットとして、より良いプロンプトを生成するためのステージド合成パイプラインを提案する。
参考スコア（独自算出の注目度）: 49.98710755440242
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics or aggregated from off-the-shelf datasets. We reveal a critical inefficiency in this approach: up to 69% of the prompts in standard chart / document reasoning datasets are effectively zero-delta, meaning the teacher and student already induce the exact same answer distribution. Training on these prompts provides minimal learning signal, causing student improvement to rapidly saturate regardless of data scale. To escape the zero-delta trap, we return to first principles: distillation fundamentally minimizes distributional divergence, and thus a prompt is valuable only if it exposes a functional capability gap between the teacher and student. We quantify this gap through answer divergence ($Δ$), demonstrating that non-zero divergence is critical for effective scaling. Building on this insight, we propose a staged synthesis pipeline that repurposes existing datasets as seeds, actively targeting student failure modes to produce better prompts. The result is DeltaPrompts, a diverse dataset of 200k synthetic, high-divergence reasoning problems. We evaluate DeltaPrompts across three distinct settings: on-policy distillation with the target teacher-student pair, transfer to a novel model family without regenerating the data, and off-policy fine-tuning of a non-reasoning model. Across all scenarios, DeltaPrompts drives substantial gains, yielding up to 15% relative improvement even on top of a highly-optimized reasoning model (e.g., Qwen3-VL-8B-Thinking) -- averaged over 10 benchmarks spanning chart, document and perception-centric reasoning.
Abstract（参考訳）: 蒸留により、コンパクトなビジョンランゲージモデル(VLM)が強力な推論能力を得ることができるが、このプロセスを実行するプロンプトは、通常、単純なヒューリスティックまたは既成のデータセットから選択される。標準チャート/文書推論データセットにおけるプロンプトの最大69%は事実上ゼロデルタであり、教師と学生が既に全く同じ回答分布を誘導している。これらのプロンプトをトレーニングすることで、学習信号が最小限に抑えられ、データスケールに関係なく、生徒は急速に飽和する。蒸留は基本的に分布のばらつきを最小化し、教師と学生の間で機能的能力のギャップを露呈した場合に限って、プロンプトは貴重である。我々は、解の発散(Δ$)を通じてこのギャップを定量化し、非ゼロ発散が効果的なスケーリングに重要であることを示す。この知見に基づいて、既存のデータセットをシードとして再利用し、より優れたプロンプトを生成するために、学生の障害モードを積極的にターゲットとした、ステージ化された合成パイプラインを提案する。その結果、DeltaPromptsは200kの合成高分散推論問題からなる多様なデータセットである。対象とする教師と学生のペアによるオンライン蒸留、データを再生成することなく新しいモデルファミリへの移行、非合理的モデルの非政治微調整の3つの異なる設定でDeltaPromptsを評価した。すべてのシナリオにおいて、DeltaPromptsは大幅に向上し、高度に最適化された推論モデル(Qwen3-VL-8B-Thinkingなど)上でも、最大15%の相対的な改善を実現している。

論文の概要: DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

関連論文リスト