Fugu-MT 論文翻訳(概要): CoD-Lite: Real-Time Diffusion-Based Generative Image Compression

論文の概要: CoD-Lite: Real-Time Diffusion-Based Generative Image Compression

arxiv url: http://arxiv.org/abs/2604.12525v2
Date: Wed, 15 Apr 2026 01:25:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 13:09:57.530245
Title: CoD-Lite: Real-Time Diffusion-Based Generative Image Compression
Title（参考訳）: CoD-Lite:リアルタイム拡散に基づく生成画像圧縮
Authors: Zhaoyang Jia, Naifu Xue, Zihan Zheng, Jiahao Li, Bin Li, Xiaoyi Zhang, Zongyu Guo, Yuan Zhang, Houqiang Li, Yan Lu,
Abstract要約: 実時間および軽量拡散コーデックの設計について検討する。生成指向事前学習は小さなモデルスケールでは効果が低いのに対して,圧縮指向事前学習では性能が一貫して向上することがわかった。
参考スコア（独自算出の注目度）: 58.56387132156189
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advanced diffusion methods typically derive strong generative priors by scaling diffusion transformers. However, scaling fails to generalize when adapted for real-time compression scenarios that demand lightweight models. In this paper, we explore the design of real-time and lightweight diffusion codecs by addressing two pivotal questions. First, does diffusion pre-training benefit lightweight diffusion codecs? Through systematic analysis, we find that generation-oriented pre-training is less effective at small model scales whereas compression-oriented pre-training yields consistently better performance. Second, are transformers essential? We find that while global attention is crucial for standard generation, lightweight convolutions suffice for compression-oriented diffusion when paired with distillation. Guided by these findings, we establish a one-step lightweight convolution diffusion codec that achieves real-time $60$~FPS encoding and $42$~FPS decoding at 1080p. Further enhanced by distillation and adversarial learning, the proposed codec reduces bitrate by 85\% at a comparable FID to MS-ILLM, bridging the gap between generative compression and practical real-time deployment. Codes are released at https://github.com/microsoft/GenCodec/tree/main/CoD_Lite
Abstract（参考訳）: 近年の拡散法は、拡散変圧器のスケーリングにより、強い生成先行を導出する。しかし、軽量モデルを必要とするリアルタイム圧縮シナリオに適応すると、スケーリングは一般化に失敗する。本稿では,2つの重要な問題に対処し,リアルタイムおよび軽量拡散コーデックの設計について検討する。第一に、拡散事前学習は軽量拡散コーデックの利点があるか? 系統的な解析により、生成指向事前学習は小さなモデルスケールでは効果が低いのに対し、圧縮指向事前学習は一貫して性能が向上することがわかった。次に、トランスフォーマーは必須か? グローバルな注目は標準生成には不可欠であるが, 蒸留と組み合わせた場合, 圧縮指向拡散には軽量な畳み込みが十分であることがわかった。これらの結果から,60ドル～FPS符号化と42ドル～FPS復号を1080pで実現した1ステップの軽量畳み込み拡散コーデックを構築した。提案コーデックは, 蒸留と対向学習によりさらに強化され, 生成圧縮と実時間展開のギャップを埋めるため, ビットレートを85%削減する。コードはhttps://github.com/microsoft/GenCodec/tree/main/CoD_Liteで公開されている。

論文の概要: CoD-Lite: Real-Time Diffusion-Based Generative Image Compression

関連論文リスト