Fugu-MT 論文翻訳(概要): AutoPrune: Each Complexity Deserves a Pruning Policy

論文の概要: AutoPrune: Each Complexity Deserves a Pruning Policy

arxiv url: http://arxiv.org/abs/2509.23931v1
Date: Sun, 28 Sep 2025 15:09:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.540813
Title: AutoPrune: Each Complexity Deserves a Pruning Policy
Title（参考訳）: AutoPrune: 各複雑性はプルーニングポリシーを保存する
Authors: Hanshi Wang, Yuhao Xu, Zekun Xu, Jin Gao, Yufan Liu, Weiming Hu, Ke Wang, Zhipeng Zhang,
Abstract要約: Complexity Pruning(AutoPrune)は、プルングポリシーをさまざまなサンプルやタスクの複雑さに合わせて調整する、トレーニングフリーのプラグイン・アンド・プレイフレームワークである。我々はAutoPruneを、標準的な視覚適応タスクと、自律運転のためのビジョン・ランゲージ・アクションモデルで評価する。
参考スコア（独自算出の注目度）: 58.448785378705566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The established redundancy in visual tokens within large vision-language models allows pruning to effectively reduce their substantial computational demands. Previous methods typically employ heuristic layer-specific pruning strategies where, although the number of tokens removed may differ across decoder layers, the overall pruning schedule is fixed and applied uniformly to all input samples and tasks, failing to align token elimination with the model's holistic reasoning trajectory. Cognitive science indicates that human visual processing often begins with broad exploration to accumulate evidence before narrowing focus as the target becomes distinct. Our experiments reveal an analogous pattern in these models. This observation suggests that neither a fixed pruning schedule nor a heuristic layer-wise strategy can optimally accommodate the diverse complexities inherent in different inputs. To overcome this limitation, we introduce Complexity-Adaptive Pruning (AutoPrune), a training-free, plug-and-play framework that tailors pruning policies to varying sample and task complexities. Specifically, AutoPrune quantifies the mutual information between visual and textual tokens, then projects this signal to a budget-constrained logistic retention curve. Each such logistic curve, defined by its unique shape, corresponds to the specific complexity of different tasks and can guarantee adherence to predefined computational constraints. We evaluate AutoPrune on standard vision-language tasks and on Vision-Language-Action models for autonomous driving. Notably, when applied to LLaVA-1.5-7B, our method prunes 89% of visual tokens and reduces inference FLOPs by 76.8% while retaining 96.7% of the original accuracy averaged over all tasks. This corresponds to a 9.1% improvement over the recent work PDrop, demonstrating the effectiveness. Code is available at https://github.com/AutoLab-SAI-SJTU/AutoPrune.
Abstract（参考訳）: 大きな視覚言語モデルにおける視覚トークンの確立された冗長性により、プルーニングは実質的な計算要求を効果的に減らすことができる。従来の手法では、除去されたトークンの数はデコーダ層によって異なるが、全体のプルーニングスケジュールは固定され、全ての入力サンプルやタスクに一様に適用され、モデルの全体論的推論軌道とトークン除去の整合に失敗する、ヒューリスティックな層固有のプルーニング戦略が採用されていた。認知科学は、人間の視覚処理が、目標が明確になるにつれて焦点を狭める前に証拠を蓄積する広範囲な探索から始まることをしばしば示している。我々の実験はこれらのモデルに類似したパターンを明らかにした。この観察は、固定的なプルーニングスケジュールやヒューリスティックな階層戦略が、異なる入力に固有の様々な複雑さに最適に対応できないことを示唆している。この制限を克服するために、さまざまなサンプルやタスクの複雑度に合わせてプルーニングポリシーをカスタマイズするトレーニングフリーのプラグイン・アンド・プレイフレームワークである、Complexity-Adaptive Pruning (AutoPrune)を導入する。具体的には、AutoPruneは視覚トークンとテキストトークンの相互情報を定量化し、この信号を予算制約のロジスティック保持曲線に投影する。そのようなロジスティック曲線は、それぞれ独自の形状で定義され、異なるタスクの特定の複雑さに対応し、事前定義された計算制約への順守を保証することができる。我々はAutoPruneを、標準的な視覚言語タスクと、自律運転のためのビジョン・ランゲージ・アクションモデルで評価する。特に、LLaVA-1.5-7Bに適用した場合、視覚トークンの89%をプルークし、推論FLOPを76.8%削減し、元の精度の96.7%を全タスクで保持する。これは最近のPDropよりも9.1%改善され、効果が示された。コードはhttps://github.com/AutoLab-SAI-SJTU/AutoPrune.comで入手できる。

論文の概要: AutoPrune: Each Complexity Deserves a Pruning Policy

関連論文リスト