Fugu-MT 論文翻訳(概要): ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

論文の概要: ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

arxiv url: http://arxiv.org/abs/2606.00111v1
Date: Wed, 27 May 2026 04:36:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:27.966524
Title: ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression
Title（参考訳）: ChWDTA:学習画像圧縮のためのチャネルワイドウェーブレット領域変換器の注意とエントロピーモデリング
Authors: Haisheng Fu, Runyu Yang, Feng Ding, Siyu Zhu, Jie Liang, Xiaoxiao Li, Zhenman Fang, Jingning Han,
Abstract要約: チャネルワイドウェーブレット変換をトランスおよびエントロピー符号化の両方に導入する。この構成により、提案手法は、Kodak、CLIC Professional Validation、およびTecnickテストセット上での-17.82%、-19.15%、-22.56%のBDレートの削減が得られる。
参考スコア（独自算出の注目度）: 36.75128193748412
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art learned image compression (LIC) schemes are increasingly based on hybrid CNN-transformer architectures. To further improve rate-distortion performance, we introduce channel-wise wavelet transforms into both the transformer and entropy-coding components. First, we propose a channel-wise wavelet-domain transformer attention (ChWDTA) mechanism. ChWDTA keeps the efficient windowed spatial self-attention used in modern LIC backbones, but computes the Q/K/V projections on channel-wise wavelet-transformed features before mapping the attention output back with the inverse transform. The resulting Channel-wise Wavelet-Domain Transformer Block (ChWDTB) therefore preserves the spatial tokenization pattern of windowed attention while sparsifying the channel covariance seen by the attention projections. Second, in the entropy-coding stage, we introduce a channel-wise wavelet packet (ChWP) decomposition that produces four equal-sized subbands, which better fit channel-wise slice-based autoregressive entropy modeling. When each channel-wise subband is divided into two slices, we use eight slices for entropy coding. With this configuration, the proposed scheme obtains BD-rate reductions of -17.82%, -19.15%, and -22.56% on the Kodak, CLIC Professional Validation, and Tecnick test sets, respectively. Even when each channel-wise subband is coded as a single slice, the scheme still retains most of the coding gains with lower complexity. The results confirm the advantage of introducing wavelet transform in CNN-transformer-based LIC schemes.
Abstract（参考訳）: 最先端の学習画像圧縮(lic)スキームは、ますますハイブリッドCNN変換器アーキテクチャに基づいている。速度歪み特性を改善するため、チャネルワイドウェーブレット変換をトランスおよびエントロピー符号化の両方に導入する。まず,チャネルワイド・ウェーブレット・ドメイン・トランスフォーマー・アテンション(ChWDTA)機構を提案する。 ChWDTAは、現代のlicバックボーンで使用される効率的なウィンドウ付き空間自己アテンションを保っているが、アテンション出力を逆変換にマッピングする前に、チャネルワイドウェーブレット変換された特徴のQ/K/Vプロジェクションを計算する。 ChWDTB (Channel-wise Wavelet-Domain Transformer Block) は、アテンションプロジェクションによって見られるチャネルの共分散を分散させながら、ウィンドウ化されたアテンションの空間的トークン化パターンを保存する。第二に、エントロピー符号化の段階では、チャンネルワイドウェーブレットパケット(ChWP)分解を導入し、4つの等サイズのサブバンドを生成し、チャネルワイドスライスに基づく自己回帰エントロピーモデルに適合する。各チャネルワイドサブバンドを2つのスライスに分割する場合、エントロピー符号化に8つのスライスを使用する。この構成により、提案手法は、Kodak、CLIC Professional Validation、Tecnickテストセットにおいて、それぞれ-17.82%、-19.15%、-22.56%のBDレートの削減が得られる。各チャネルワイズサブバンドを1つのスライスとして符号化しても、このスキームは複雑さの低いコーディングゲインのほとんどを保持する。この結果から,CNN変換方式におけるウェーブレット変換の利点が確認された。

論文の概要: ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

関連論文リスト