Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields
- URL: http://arxiv.org/abs/2504.21814v1
- Date: Wed, 30 Apr 2025 17:20:14 GMT
- Title: Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields
- Authors: Yixin Gao, Xiaohan Pan, Xin Li, Zhibo Chen,
- Abstract summary: AIGC foundation models are powerful enough to faithfully generate intricate structure and fine-grained details from nothing more than compact descriptors.<n>Recent GPT-4o image generation of OpenAI has achieved impressive cross-modality generation, editing, and design capabilities.
- Score: 14.805239427360208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid development of AIGC foundation models has revolutionized the paradigm of image compression, which paves the way for the abandonment of most pixel-level transform and coding, compelling us to ask: why compress what you can generate if the AIGC foundation model is powerful enough to faithfully generate intricate structure and fine-grained details from nothing more than some compact descriptors, i.e., texts, or cues. Fortunately, recent GPT-4o image generation of OpenAI has achieved impressive cross-modality generation, editing, and design capabilities, which motivates us to answer the above question by exploring its potential in image compression fields. In this work, we investigate two typical compression paradigms: textual coding and multimodal coding (i.e., text + extremely low-resolution image), where all/most pixel-level information is generated instead of compressing via the advanced GPT-4o image generation function. The essential challenge lies in how to maintain semantic and structure consistency during the decoding process. To overcome this, we propose a structure raster-scan prompt engineering mechanism to transform the image into textual space, which is compressed as the condition of GPT-4o image generation. Extensive experiments have shown that the combination of our designed structural raster-scan prompts and GPT-4o's image generation function achieved the impressive performance compared with recent multimodal/generative image compression at ultra-low bitrate, further indicating the potential of AIGC generation in image compression fields.
Related papers
- Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression [7.643300240138419]
We introduce a scalable cross-modal compression framework that incorporates multiple human-comprehensible modalities.<n>Our framework encodes images into a layered bitstream consisting of a semantic layer that delivers high-level semantic information.<n>Our method proficiently restores both semantic and visual details, competing against baseline approaches at extremely lows.
arXiv Detail & Related papers (2024-12-17T15:01:35Z) - Towards Defining an Efficient and Expandable File Format for AI-Generated Contents [23.217964968742823]
We propose a new file format for AIGC images, named AIGIF, enabling ultra-low coding of AIGC images.<n>We experimentally find that a well-designed composable bitstream structure incorporating the above three factors can achieve an impressive compression ratio of even up to 1/10,000.
arXiv Detail & Related papers (2024-10-13T13:29:30Z) - Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption [52.82508784748278]
This paper proposes a Control Generative Image Compression framework, termed Control-GIC.
Control-GIC is capable of fine-grained adaption across a broad spectrum while ensuring high-fidelity and generality compression.
We develop a conditional decoder capable of retrieving historic multi-granularity representations according to encoded codes, and then reconstruct hierarchical features in the formalization of conditional probability.
arXiv Detail & Related papers (2024-06-02T14:22:09Z) - MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation [54.64194935409982]
We introduce MuLAn: a novel dataset comprising over 44K MUlti-Layer-wise RGBA decompositions.
MuLAn is the first photorealistic resource providing instance decomposition and spatial information for high quality images.
We aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions.
arXiv Detail & Related papers (2024-04-03T14:58:00Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - Exploring the Limits of Semantic Image Compression at Micro-bits per
Pixel [8.518076792914039]
We use GPT-4V and DALL-E3 from OpenAI to explore the quality-compression frontier for image compression.
We push semantic compression as low as 100 $mu$bpp (up to $10,000times$ smaller than JPEG) by introducing an iterative reflection process.
We further hypothesize this 100 $mu$bpp level represents a soft limit on semantic compression at standard image resolutions.
arXiv Detail & Related papers (2024-02-21T05:14:30Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - The Devil Is in the Details: Window-based Attention for Image
Compression [58.1577742463617]
Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
arXiv Detail & Related papers (2022-03-16T07:55:49Z) - Enhanced Invertible Encoding for Learned Image Compression [40.21904131503064]
In this paper, we propose an enhanced Invertible.
Network with invertible neural networks (INNs) to largely mitigate the information loss problem for better compression.
Experimental results on the Kodak, CLIC, and Tecnick datasets show that our method outperforms the existing learned image compression methods.
arXiv Detail & Related papers (2021-08-08T17:32:10Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.