Fugu-MT 論文翻訳(概要): Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers

論文の概要: Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers

arxiv url: http://arxiv.org/abs/2508.16211v1
Date: Fri, 22 Aug 2025 08:34:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 16:42:36.320297
Title: Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers
Title（参考訳）: Forecast then Calibrate: 効率的な拡散変換器のODEとしての特徴キャッシング
Authors: Shikang Zheng, Liang Feng, Xinyu Wang, Qinming Zhou, Peiliang Cai, Chang Zou, Jiacheng Liu, Yuqi Lin, Junjie Chen, Yue Ma, Linfeng Zhang,
Abstract要約: 拡散変換器(DiT)は高忠実度画像とビデオ生成において例外的な性能を示した。現在の方法は、しばしば高い加速比で生成品質を維持するのに苦労する。本稿では,機能キャッシングを機能-ODE問題として扱うFoCaを提案する。
参考スコア（独自算出の注目度）: 19.107716099809707
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Transformers (DiTs) have demonstrated exceptional performance in high-fidelity image and video generation. To reduce their substantial computational costs, feature caching techniques have been proposed to accelerate inference by reusing hidden representations from previous timesteps. However, current methods often struggle to maintain generation quality at high acceleration ratios, where prediction errors increase sharply due to the inherent instability of long-step forecasting. In this work, we adopt an ordinary differential equation (ODE) perspective on the hidden-feature sequence, modeling layer representations along the trajectory as a feature-ODE. We attribute the degradation of existing caching strategies to their inability to robustly integrate historical features under large skipping intervals. To address this, we propose FoCa (Forecast-then-Calibrate), which treats feature caching as a feature-ODE solving problem. Extensive experiments on image synthesis, video generation, and super-resolution tasks demonstrate the effectiveness of FoCa, especially under aggressive acceleration. Without additional training, FoCa achieves near-lossless speedups of 5.50 times on FLUX, 6.45 times on HunyuanVideo, 3.17 times on Inf-DiT, and maintains high quality with a 4.53 times speedup on DiT.
Abstract（参考訳）: 拡散変換器(DiT)は高忠実度画像とビデオ生成において例外的な性能を示した。計算コストを大幅に削減するため,従来の時間ステップから隠れ表現を再利用することで推論を高速化する特徴キャッシング手法が提案されている。しかし、現在の手法は、長いステップ予測の固有の不安定性のために予測誤差が急激に増加するような、高い加速比で生成品質を維持するのに苦労することが多い。本研究では,正規微分方程式(ODE)を隠れ特徴列の視点に適用し,その軌道に沿った層表現を特徴量としてモデル化する。既存のキャッシュ戦略の劣化は、大きなスキップ間隔で歴史的特徴をしっかりと統合できないためである。これを解決するために,機能キャッシングを機能-ODE問題として扱うFoCa(Forecast-then-Calibrate)を提案する。画像合成、ビデオ生成、超高解像度タスクに関する大規模な実験は、特にアグレッシブアクセラレーション下でのFoCaの有効性を実証している。追加トレーニングなしでは、FoCaはFLUXで5.50回、HunyuanVideoで6.45回、Inf-DiTで3.17回、DiTで4.53回、高品質を維持している。

論文の概要: Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers

関連論文リスト