Fugu-MT 論文翻訳(概要): GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization

論文の概要: GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization

arxiv url: http://arxiv.org/abs/2505.13542v1
Date: Mon, 19 May 2025 00:18:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-21 14:49:52.383988
Title: GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization
Title（参考訳）: GANCompress: 2次元球面量子化によるGAN強調ニューラルイメージ圧縮
Authors: Karthik Sivakoti,
Abstract要約: GANCompressは、二元球量子化(BSQ)とGAN(Generative Adversarial Networks)を組み合わせた新しいニューラル圧縮フレームワークである。 GANCompressは圧縮効率を大幅に向上し、ファイルサイズを最大100倍まで削減し、視覚的歪みを最小限に抑える。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The exponential growth of visual data in digital communications has intensified the need for efficient compression techniques that balance rate-distortion performance with computational feasibility. While recent neural compression approaches have shown promise, they still struggle with fundamental challenges: preserving perceptual quality at high compression ratios, computational efficiency, and adaptability to diverse visual content. This paper introduces GANCompress, a novel neural compression framework that synergistically combines Binary Spherical Quantization (BSQ) with Generative Adversarial Networks (GANs) to address these challenges. Our approach employs a transformer-based autoencoder with an enhanced BSQ bottleneck that projects latent representations onto a hypersphere, enabling efficient discretization with bounded quantization error. This is followed by a specialized GAN architecture incorporating frequency-domain attention and color consistency optimization. Experimental results demonstrate that GANCompress achieves substantial improvement in compression efficiency -- reducing file sizes by up to 100x with minimal visual distortion. Our method outperforms traditional codecs like H.264 by 12-15% in perceptual metrics while maintaining comparable PSNR/SSIM values, with 2.4x faster encoding and decoding speeds. On standard benchmarks including ImageNet-1k and COCO2017, GANCompress sets a new state-of-the-art, reducing FID from 0.72 to 0.41 (43% improvement) compared to previous methods while maintaining higher throughput. This work presents a significant advancement in neural compression technology with promising applications for real-time visual communication systems.
Abstract（参考訳）: デジタル通信における視覚データの指数関数的増加は、計算可能性とのバランスをとる効率的な圧縮技術の必要性を増している。最近のニューラル圧縮アプローチは、将来性を示す一方で、高い圧縮比での知覚品質の保存、計算効率、多様な視覚コンテンツへの適応性といった、基本的な課題に苦慮している。本稿では,2次元球面量子化(BSQ)とGAN(Generative Adversarial Networks)を相乗的に組み合わせ,これらの課題に対処する新しいニューラル圧縮フレームワークであるGANCompressを紹介する。提案手法では,超球面上に潜在表現を投影し,有界量子化誤差による効率的な離散化を可能にする,拡張されたBSQボトルネックを持つトランスフォーマーベースのオートエンコーダを用いる。これに続いて、周波数領域の注意と色一貫性の最適化を取り入れた特殊なGANアーキテクチャが導入された。実験の結果、GANCompressは圧縮効率を大幅に改善し、ファイルサイズを最大100倍まで削減し、視覚的歪みを最小限に抑えることができた。提案手法は,PSNR/SSIM値に匹敵する性能を維持しつつ,H.264のような従来のコーデックを12-15%向上させ,符号化速度と復号速度を2.4倍に向上させる。 ImageNet-1k や COCO2017 などの標準ベンチマークでは、GANCompress は新しい最先端を設定しており、高いスループットを維持しながら、以前の方法と比較して FID を 0.72 から 0.41 に削減した(43%の改善)。この研究は、リアルタイム視覚通信システムに期待できる応用で、ニューラル圧縮技術の大幅な進歩を示す。

論文の概要: GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization

関連論文リスト