Fugu-MT 論文翻訳(概要): Adaptive MLP Pruning for Large Vision Transformers

論文の概要: Adaptive MLP Pruning for Large Vision Transformers

arxiv url: http://arxiv.org/abs/2603.08100v1
Date: Mon, 09 Mar 2026 08:42:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.711352
Title: Adaptive MLP Pruning for Large Vision Transformers
Title（参考訳）: 大規模視覚変換器の適応型MLPプルーニング
Authors: Chengchao Shen,
Abstract要約: 本稿では,大きな視覚変換器のパラメータを明らかに劣化させることなく大幅に削減する適応型プルーニング法を提案する。 CLIP や DINOv2 などの最先端の大規模視覚変換器の実験結果から,提案手法が約40% のパラメータとFLOPs をほぼほぼ減少することを示す。
参考スコア（独自算出の注目度）: 2.821038594191455
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large vision transformers present impressive scalability, as their performance can be well improved with increased model capacity. Nevertheless, their cumbersome parameters results in exorbitant computational and memory demands. By analyzing prevalent transformer structures, we find that multilayer perceptron (MLP) modules constitute the largest share of the model's parameters. In this paper, we propose an Adaptive MLP Pruning (AMP) method to substantially reduce the parameters of large vision transformers without obvious performance degradation. First, we adopt Taylor based method to evaluate neuron importance of MLP. However, the importance computation using one-hot cross entropy loss ignores the potential predictions on other categories, thus degrading the quality of the evaluated importance scores. To address this issue, we introduce label-free information entropy criterion to fully model the predictions of the original model for more accurate importance evaluation. Second, we rank the hidden neurons of MLP by the above importance scores and apply binary search algorithm to adaptively prune the ranked neurons according to the redundancy of different MLP modules, thereby avoiding the predefined compression ratio. Experimental results on several state-of-the-art large vision transformers, including CLIP and DINOv2, demonstrate that our method achieves roughly 40\% parameter and FLOPs reduction in a near lossless manner. Moreover, when the models are not finetuned after pruning, our method outperforms other pruning methods by significantly large margin. The source code and trained weights are available at https://github.com/visresearch/AMP.
Abstract（参考訳）: 大きなビジョントランスフォーマーは、モデルキャパシティを向上して、そのパフォーマンスを十分に改善できるため、素晴らしいスケーラビリティを提供する。それでも、それらの厄介なパラメータは、計算とメモリの要求を極端に引き起こす。多層パーセプトロン (MLP) モジュールがモデルパラメータの最大シェアを占めている。本稿では,適応型MPPプルーニング法を提案し,大きな視覚変換器のパラメータを明らかな性能劣化を伴わずに大幅に低減する。まず,MLPのニューロンの重要性を評価するためにTaylor法を採用した。しかし、1ホットクロスエントロピー損失を用いた重要度計算は、他のカテゴリの潜在的な予測を無視し、評価された重要度スコアの品質を低下させる。この問題に対処するために,ラベルのない情報エントロピー基準を導入し,元のモデルの予測をモデル化し,より正確な重要度評価を行う。次に, MLP の隠れニューロンを上記の重要スコアでランク付けし, 異なる MLP モジュールの冗長性に応じて2値探索アルゴリズムを適用し, 予め定義された圧縮比を回避する。 CLIP や DINOv2 などの最先端の大規模視覚変換器の実験結果から,約 40 % のパラメータと FLOP の減少をほぼ損失のない方法で達成できることが実証された。さらに, プルーニング後にモデルが微調整されない場合, 他のプルーニング法よりも格段に大きなマージンで性能が向上する。ソースコードとトレーニングされたウェイトはhttps://github.com/visresearch/AMPで公開されている。

論文の概要: Adaptive MLP Pruning for Large Vision Transformers

関連論文リスト