Fugu-MT 論文翻訳(概要): Dual-Latent Collaborative Decoding for Fidelity-Perception Balanced Image Compression

論文の概要: Dual-Latent Collaborative Decoding for Fidelity-Perception Balanced Image Compression

arxiv url: http://arxiv.org/abs/2605.14391v1
Date: Thu, 14 May 2026 05:13:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.637199
Title: Dual-Latent Collaborative Decoding for Fidelity-Perception Balanced Image Compression
Title（参考訳）: 重み付き知覚バランス画像圧縮のためのデュアルレイテンシコラボレーティブデコーディング
Authors: Qi Mao, Zijian Wang, Zhengxue Cheng, Lingyu Zhu, Siwei Ma,
Abstract要約: 本稿では,相補的潜在パラダイム間での責務を分解する,二重遅延協調的復号化フレームワークを提案する。 MoDEは、SQブランチをフィデリティ指向の専門家として、VQブランチを知覚指向の専門家として扱い、2つのデコーダ側モジュールを通じてそれらを調整する。このフレームワークは、共有デュアルストリームビットストリーム下での選択的クロスレイテンシ協調をサポートし、忠実度アンコールと知覚アンコールの両方のデコーディングを可能にする。
参考スコア（独自算出の注目度）: 35.48235920552014
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learned image compression (LIC) increasingly requires reconstructions that balance distortion fidelity and perceptual realism across a wide range of bitrates. However, most existing methods still rely on a single compressed latent representation to simultaneously carry structural details, semantic cues, and perceptual priors, requiring the same latent representation to serve multiple, potentially conflicting roles. This tension becomes evident across different latent paradigms: scalar-quantized (SQ) continuous latents provide rate-scalable fidelity but tend to lose perceptual details at low rates, while vector-quantized (VQ) discrete tokens preserve compact semantic cues but suffer from limited structural fidelity and bitrate scalability. To address this issue, we propose Mixture of Decoder Experts (MoDE), a dual-latent collaborative decoding framework that decomposes reconstruction responsibilities across complementary latent paradigms. Specifically, MoDE treats the SQ branch as a fidelity-oriented expert and the VQ branch as a perception-oriented expert, and coordinates them through two decoder-side modules: Expert-Specific Enhancement (ESE), which preserves branch-specific expert references, and Cross-Expert Modulation (CEM), which enables selective complementary transfer during reconstruction. The resulting framework supports selective cross-latent collaboration under a shared dual-stream bitstream and enables both fidelity-anchored and perception-anchored decoding. Extensive experiments demonstrate that MoDE achieves a more favorable fidelity-perception balance than representative distortion-oriented, perception-oriented, generative, and dual-latent baselines across a wide bitrate range, highlighting decoder-side expert collaboration as an effective design for wide-range fidelity-perception balanced LIC.
Abstract（参考訳）: 学習された画像圧縮(lic)は、広範囲のビットレートにわたって歪みの忠実度と知覚的リアリズムのバランスをとる再構成をますます要求する。しかし、既存のほとんどのメソッドは、構造的詳細、意味的手がかり、知覚的事前を同時に持たせるために、単一の圧縮された潜在表現に依存しており、同じ潜在表現が複数の、潜在的に矛盾する役割を果たす必要がある。 scal-quantized (SQ) continuous latent はレートスケーリング可能な忠実さを提供するが、低レートで知覚の詳細を失う傾向にあり、一方、ベクトル量子化 (VQ) 離散トークンはコンパクトなセマンティックキューを保存するが、構造的忠実性とビットレートのスケーラビリティに制限される。この問題に対処するために、補完的な潜在パラダイム間で再構成責任を分解する二段階協調復号化フレームワークであるMixture of Decoder Experts (MoDE)を提案する。具体的には、SQブランチをフィデリティ指向の専門家として、VQブランチを知覚指向の専門家として扱い、それらを2つのデコーダ側モジュールとしてコーディネートする。このフレームワークは、共有デュアルストリームビットストリーム下での選択的クロスラテントコラボレーションをサポートし、フィデリティアンコールと知覚アンコールの両方のデコーディングを可能にする。大規模な実験により、MoDEは、広帯域の歪み指向、知覚指向、生成性、二重遅延ベースラインよりも、より好ましいフィデリティ知覚バランスを実現し、広帯域のフィデリティ知覚バランスの効果的な設計としてデコーダ側のエキスパートコラボレーションを強調した。

論文の概要: Dual-Latent Collaborative Decoding for Fidelity-Perception Balanced Image Compression

関連論文リスト