Fugu-MT 論文翻訳(概要): ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration

論文の概要: ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration

arxiv url: http://arxiv.org/abs/2605.22015v1
Date: Thu, 21 May 2026 05:23:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 20:14:18.521769
Title: ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration
Title（参考訳）: ORBIS:映像拡散加速のための分布認識マッチングによる出力誘導型トークン削減
Authors: Hangyeol Lee, Joo-Young Kim,
Abstract要約: Diffusion Transformer (DiT) は高品質な画像やビデオを生成するための強力なモデルアーキテクチャとして登場した。ビデオDiTのためのSW-HW共同設計アクセラレータであるORBISを提案する。 ORBISは最先端のアプローチであるAsymRnRよりも約2倍高いトークン還元率を示す。
参考スコア（独自算出の注目度）: 2.918426765142262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Transformer (DiT) has emerged as a powerful model architecture for generating high-quality images and videos. In the case of video DiT, 3D Spatio-Temporal Attention increases token length in proportion to the number of frames, sharply increasing computational cost. Token reduction methods mitigate this cost by exploiting spatial redundancy, but existing approaches rely on inaccurate similarity estimates and lightweight matching algorithms, resulting in poor matching quality and only marginal acceleration. To overcome these limitations, we propose ORBIS, an SW-HW co-designed accelerator for video DiT. ORBIS leverages the output activation from the previous timestep to obtain more accurate inter-token similarity, substantially improving matching quality and enabling a higher token reduction ratio. We further introduce a Distribution-Aware Token Matching (DATM) algorithm that captures global token distribution and explicitly minimizes token-pair loss for additional gains. To fully hide DATM latency, we design specialized, deeply pipelined hardware and minimize its hardware cost through quantization, occupying only 2.4% of total area with negligible accuracy loss. Extensive experiments show that ORBIS achieves about 2x higher token reduction ratio than the state-of-the-art approach, AsymRnR, while delivering up to 4.5x speedup and 79.3% energy reduction compared to an NVIDIA A100 GPU.
Abstract（参考訳）: Diffusion Transformer (DiT) は高品質な画像やビデオを生成するための強力なモデルアーキテクチャとして登場した。ビデオDiTの場合、3次元時空間注意はフレーム数に比例してトークン長を増大させ、計算コストを急激に増加させる。トケ還元法は空間冗長性を利用してこのコストを軽減するが、既存の手法では不正確な類似性推定と軽量マッチングアルゴリズムに依存しており、整合性は低く、限界加速度のみとなる。これらの制限を克服するために、SW-HWと共同設計したビデオDiTアクセラレータであるORBISを提案する。 ORBISは、前のタイムステップからの出力活性化を利用して、より正確なトークン間類似性を得るとともに、マッチング品質を大幅に改善し、より高いトークン還元比を実現する。さらに、グローバルなトークン分布を捕捉し、追加利得に対するトークン対損失を明示的に最小化する分散対応トークンマッチング(DATM)アルゴリズムを導入する。 DATMのレイテンシを完全に隠蔽するために、我々は専用で深くパイプライン化されたハードウェアを設計し、量子化によってハードウェアコストを最小化し、無視できる精度の損失で全領域の2.4%しか占めていない。大規模な実験により、ORBISは最先端のアプローチであるAsymRnRよりも約2倍高いトークン還元比を達成し、NVIDIA A100 GPUと比較して最大4.5倍のスピードアップと79.3%のエネルギー還元を実現している。

論文の概要: ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration

関連論文リスト