Fugu-MT 論文翻訳(概要): AdaMerge: Salience-Aware Adaptive Token Merging for Training-Free Acceleration of Vision Transformers

論文の概要: AdaMerge: Salience-Aware Adaptive Token Merging for Training-Free Acceleration of Vision Transformers

arxiv url: http://arxiv.org/abs/2605.27465v1
Date: Tue, 26 May 2026 05:35:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.298448
Title: AdaMerge: Salience-Aware Adaptive Token Merging for Training-Free Acceleration of Vision Transformers
Title（参考訳）: AdaMerge: 視覚変換器のトレーニング不要高速化のためのサリエンス対応トケマージ
Authors: Semi Lee, Hyejin Go, Hyesong Choi,
Abstract要約: AdaMergeは2つの補完メカニズムに基づいたトークンマージフレームワークである。 ToMe、PiToMe、DSMよりずっと優れています。
参考スコア（独自算出の注目度）: 5.739405014622565
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The quadratic cost of self-attention in Vision Transformers (ViTs) constitutes a fundamental bottleneck for practical deployment, motivating a vibrant line of research on token reduction. Among existing approaches, token merging (ToMe) has emerged as an elegant training-free solution; yet its design rests on an unspoken premise of token equality, which contravenes the well-documented non-uniformity of self-attention and leads to information loss in high-salience tokens under aggressive compression. We address this limitation with AdaMerge, a token-merging framework based on two complementary mechanisms. First, salience-weighted similarity leverages column-wise feature-affinity centrality as a token-importance proxy and incorporates the resulting salience scores into the bipartite matching score, ensuring that pivotal tokens contribute more strongly to the merged representation. Second, adaptive merging intensity uses pre-computed layer-wise similarity statistics to dynamically modulate the per-layer reduction count in accordance with input-specific redundancy. On ImageNet-1k with ViT-B/16, AdaMerge consistently outperforms ToMe, PiToMe, and DSM across all FLOPs-matched regimes. The accuracy gap widens monotonically with compression: at the 13.4G FLOPs operating point, AdaMerge sustains a Top-1 degradation of only -1.06%, compared to -1.45% for PiToMe and -4.62% for DSM. To our knowledge, AdaMerge is the first to combine salience-weighted similarity and adaptive per-layer reduction into a single training-free token merging framework, advancing the accuracy-FLOPs Pareto frontier of ViT acceleration.
Abstract（参考訳）: 視覚変換器(ViT)における自己注意の二次的コストは、トークン還元の研究の活発な行を動機付け、実用的展開の基本的なボトルネックとなっている。既存のアプローチの中で、トークンマージ(ToMe)はエレガントなトレーニングなしのソリューションとして登場したが、その設計は、十分に文書化された自己アテンションの非一様性に反し、アグレッシブな圧縮の下で高可用性トークンに情報損失をもたらすトークン平等の前提に基づいている。この制限には、2つの補完メカニズムに基づいたトークンマージフレームワークであるAdaMergeを用いて対処する。第一に、サリエンス重み付き類似性は、カラムワイドの特徴親和性中心性をトークン重要度プロキシとして利用し、得られたサリエンススコアをバイパートイトマッチングスコアに組み込んで、ピボットトークンがマージされた表現により強く寄与することを保証する。第2に、アダプティブマージ強度は、事前計算された層単位での類似性統計を用いて、入力固有の冗長度に応じて、層単位の還元数を動的に変調する。 ViT-B/16 の ImageNet-1k では、AdaMerge が ToMe, PiToMe, DSM を常に上回っている。 13.4G FLOPsの動作点では、AdaMergeはトップ-1の劣化を-1.06%しか維持していないが、PiToMeは-1.45%、DSMは-4.62%である。我々の知る限り、AdaMergeは、サリエンス重み付き類似性と適応的な層単位の削減を単一のトレーニングフリートークンマージフレームワークに組み合わせ、ViTアクセラレーションの精度-FLOPs Paretoフロンティアを前進させた最初の企業です。

論文の概要: AdaMerge: Salience-Aware Adaptive Token Merging for Training-Free Acceleration of Vision Transformers

関連論文リスト