Fugu-MT 論文翻訳(概要): TED: Training-Free Experience Distillation for Multimodal Reasoning

論文の概要: TED: Training-Free Experience Distillation for Multimodal Reasoning

arxiv url: http://arxiv.org/abs/2603.26778v1
Date: Wed, 25 Mar 2026 01:08:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.5983
Title: TED: Training-Free Experience Distillation for Multimodal Reasoning
Title（参考訳）: TED:マルチモーダル推論のためのトレーニング不要体験蒸留
Authors: Shuozhi Yuan, Jinqing Wang, Zihao Liu, Miaomiao Yuan, Haoran Peng, Jin Zhao, Bingwen Wang, Haoyi Wang,
Abstract要約: TEDは、トレーニングなし、文脈ベースの蒸留フレームワークである。これは、蒸留の更新ターゲットをモデルパラメータから、学生のプロンプトに注入されたコンテキスト内体験にシフトする。
参考スコア（独自算出の注目度）: 9.796446482217418
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation is typically realized by transferring a teacher model's knowledge into a student's parameters through supervised or reinforcement-based optimization. While effective, such approaches require repeated parameter updates and large-scale training data, limiting their applicability in resource-constrained environments. In this work, we propose TED, a training-free, context-based distillation framework that shifts the update target of distillation from model parameters to an in-context experience injected into the student's prompt. For each input, the student generates multiple reasoning trajectories, while a teacher independently produces its own solution. The teacher then compares the student trajectories with its reasoning and the ground-truth answer, extracting generalized experiences that capture effective reasoning patterns. These experiences are continuously refined and updated over time. A key challenge of context-based distillation is unbounded experience growth and noise accumulation. TED addresses this with an experience compression mechanism that tracks usage statistics and selectively merges, rewrites, or removes low-utility experiences. Experiments on multimodal reasoning benchmarks MathVision and VisualPuzzles show that TED consistently improves performance. On MathVision, TED raises the performance of Qwen3-VL-8B from 0.627 to 0.702, and on VisualPuzzles from 0.517 to 0.561 with just 100 training samples. Under this low-data, no-update setting, TED achieves performance competitive with fully trained parameter-based distillation while reducing training cost by over 5x, demonstrating that meaningful knowledge transfer can be achieved through contextual experience.
Abstract（参考訳）: 知識蒸留は典型的には教師モデルの知識を教師付きまたは強化に基づく最適化を通じて生徒のパラメータに伝達することによって実現される。有効ではあるが、そのようなアプローチは繰り返しパラメータの更新と大規模なトレーニングデータを必要とし、リソース制約のある環境での適用性を制限する。本研究では, モデルパラメーターから学生のプロンプトに注入されたコンテキスト内体験へ, 蒸留の更新対象をシフトさせる, 学習自由な文脈ベース蒸留フレームワークであるTEDを提案する。各入力に対して、生徒は複数の推論軌跡を生成し、教師は独立して独自の解を生成する。教師は、学生の軌跡を推論と地味な答えと比較し、効果的な推論パターンを捉えた一般的な経験を抽出する。これらのエクスペリエンスは、時間とともに継続的に洗練され、更新されます。文脈に基づく蒸留の鍵となる課題は、経験的成長と雑音蓄積である。 TEDは、利用統計を追跡し、低ユーティリティなエクスペリエンスを選択的にマージ、リライト、削除するエクスペリエンス圧縮メカニズムでこの問題に対処する。マルチモーダル推論ベンチマークの実験 MathVision と VisualPuzzles はTED が一貫してパフォーマンスを改善していることを示している。 MathVisionでは、TEDはQwen3-VL-8Bのパフォーマンスを0.627から0.702に引き上げ、VisualPuzzlesでは0.517から0.561に、トレーニングサンプルは100に留まった。この低データの非更新設定の下でTEDは、完全に訓練されたパラメータベースの蒸留と競合し、トレーニングコストを5倍以上削減し、文脈経験を通じて有意義な知識伝達が達成できることを実証する。

論文の概要: TED: Training-Free Experience Distillation for Multimodal Reasoning

関連論文リスト