Fugu-MT 論文翻訳(概要): BiGain: Unified Token Compression for Joint Generation and Classification

論文の概要: BiGain: Unified Token Compression for Joint Generation and Classification

arxiv url: http://arxiv.org/abs/2603.12240v1
Date: Thu, 12 Mar 2026 17:55:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.278075
Title: BiGain: Unified Token Compression for Joint Generation and Classification
Title（参考訳）: BiGain: 共同生成と分類のための統一トークン圧縮
Authors: Jiacheng Liu, Shengkun Tang, Jiacheng Cui, Dongkuan Xu, Zhiqiang Shen,
Abstract要約: BiGainは、高速拡散モデルにおける分類を改善しながら、生成品質を保ちながら、トレーニング不要でプラグアンドプレイのフレームワークである。我々の重要な洞察は周波数分離であり、これは大域的な意味論から細部を解き、生成的忠実さと識別的有用性の両方を尊重する圧縮を可能にする。本分析は,拡散モデルにおけるトークン圧縮のための信頼性の高い設計規則として,高頻度の細部と低周波数のセマンティクスを保存したスペクトル保持が重要であることを示す。
参考スコア（独自算出の注目度）: 47.040577759493004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Acceleration methods for diffusion models (e.g., token merging or downsampling) typically optimize synthesis quality under reduced compute, yet often ignore discriminative capacity. We revisit token compression with a joint objective and present BiGain, a training-free, plug-and-play framework that preserves generation quality while improving classification in accelerated diffusion models. Our key insight is frequency separation: mapping feature-space signals into a frequency-aware representation disentangles fine detail from global semantics, enabling compression that respects both generative fidelity and discriminative utility. BiGain reflects this principle with two frequency-aware operators: (1) Laplacian-gated token merging, which encourages merges among spectrally smooth tokens while discouraging merges of high-contrast tokens, thereby retaining edges and textures; and (2) Interpolate-Extrapolate KV Downsampling, which downsamples keys/values via a controllable interextrapolation between nearest and average pooling while keeping queries intact, thereby conserving attention precision. Across DiT- and U-Net-based backbones and ImageNet-1K, ImageNet-100, Oxford-IIIT Pets, and COCO-2017, our operators consistently improve the speed-accuracy trade-off for diffusion-based classification, while maintaining or enhancing generation quality under comparable acceleration. For instance, on ImageNet-1K, with 70% token merging on Stable Diffusion 2.0, BiGain increases classification accuracy by 7.15% while improving FID by 0.34 (1.85%). Our analyses indicate that balanced spectral retention, preserving high-frequency detail and low/mid-frequency semantics, is a reliable design rule for token compression in diffusion models. To our knowledge, BiGain is the first framework to jointly study and advance both generation and classification under accelerated diffusion, supporting lower-cost deployment.
Abstract（参考訳）: 拡散モデル(例えばトークンマージやダウンサンプリング)の加速法は、通常、少ない計算の下で合成品質を最適化するが、しばしば識別能力は無視する。我々は,共同目的によるトークン圧縮を再考し,加速拡散モデルの分類を改善しつつ,生成品質を保ったトレーニングフリーのプラグアンドプレイフレームワークであるBiGainを提示する。特徴空間信号を周波数認識表現にマッピングすることで、グローバルセマンティクスから細かな詳細を分離し、生成的忠実性と識別的有用性の両方を尊重する圧縮を可能にする。 BiGainは、(1)高コントラストトークンのマージを抑えつつ、スペクトル的に滑らかなトークン間のマージを促進するラプラシアンゲートトークンマージ、(2)クエリをそのまま保ちながら、最も近いプールと平均プール間の制御可能なインタートラポーレーションを介してキー/値をダウンサンプリングするインターポーレート・エクストラポーレートKVダウンサンプリングの2つの原理を反映している。 DiTとU-Netベースのバックボーン、ImageNet-1K、ImageNet-100、Oxford-IIIT Pets、COCO-2017の他、当社のオペレーターは、同等の加速の下で生成品質を維持し、かつ、拡散ベースの分類における速度精度のトレードオフを一貫して改善しています。例えばImageNet-1Kでは、70%のトークンがStable Diffusion 2.0にマージされ、BiGainは分類精度を7.15%向上し、FIDは0.34(1.85%)向上した。本分析は,拡散モデルにおけるトークン圧縮のための信頼性の高い設計規則として,高頻度の細部と低周波数のセマンティクスを保存したスペクトル保持が重要であることを示す。我々の知る限り、BiGainは、加速拡散下で世代と分類の両方を共同で研究し、発展させ、低コストな展開をサポートする最初のフレームワークです。

論文の概要: BiGain: Unified Token Compression for Joint Generation and Classification

関連論文リスト