Conceptual Compression via Deep Structure and Texture Synthesis
- URL: http://arxiv.org/abs/2011.04976v2
- Date: Thu, 10 Mar 2022 10:53:06 GMT
- Title: Conceptual Compression via Deep Structure and Texture Synthesis
- Authors: Jianhui Chang, Zhenghui Zhao, Chuanmin Jia, Shiqi Wang, Lingbo Yang,
Qi Mao, Jian Zhang, Siwei Ma
- Abstract summary: We propose a novel conceptual compression framework that encodes visual data into compact structure and texture representations, then decodes in a deep synthesis fashion.
In particular, we propose to compress images by a dual-layered model consisting of two complementary visual features.
At the encoder side, the structural maps and texture representations are individually extracted and compressed, generating the compact, interpretable, inter-operable bitstreams.
During the decoding stage, a hierarchical fusion GAN (HF-GAN) is proposed to learn the synthesis paradigm where the textures are rendered into the decoded structural maps, leading to high-quality reconstruction
- Score: 42.68994438290913
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Existing compression methods typically focus on the removal of signal-level
redundancies, while the potential and versatility of decomposing visual data
into compact conceptual components still lack further study. To this end, we
propose a novel conceptual compression framework that encodes visual data into
compact structure and texture representations, then decodes in a deep synthesis
fashion, aiming to achieve better visual reconstruction quality, flexible
content manipulation, and potential support for various vision tasks. In
particular, we propose to compress images by a dual-layered model consisting of
two complementary visual features: 1) structure layer represented by structural
maps and 2) texture layer characterized by low-dimensional deep
representations. At the encoder side, the structural maps and texture
representations are individually extracted and compressed, generating the
compact, interpretable, inter-operable bitstreams. During the decoding stage, a
hierarchical fusion GAN (HF-GAN) is proposed to learn the synthesis paradigm
where the textures are rendered into the decoded structural maps, leading to
high-quality reconstruction with remarkable visual realism. Extensive
experiments on diverse images have demonstrated the superiority of our
framework with lower bitrates, higher reconstruction quality, and increased
versatility towards visual analysis and content manipulation tasks.
Related papers
- Texture-guided Coding for Deep Features [33.05814372247946]
This paper investigates features and textures and proposes a texture-guided feature compression strategy based on their characteristics.
The strategy comprises feature layers and texture layers. The feature layers serve the machine, including a feature selection module and a feature reconstruction network.
With the assistance of texture images, they selectively compress and transmit channels relevant to visual tasks, reducing feature data while providing high-quality features for the machine.
Our method fully exploits the characteristics of texture and features. It eliminates feature redundancy, reconstructs high-quality preview images for humans, and supports decision-making.
arXiv Detail & Related papers (2024-05-30T03:38:44Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - Implicit-explicit Integrated Representations for Multi-view Video
Compression [40.86402535896703]
We propose an implicit-explicit integrated representation for multi-view video compression.
The proposed framework combines the strengths of both implicit neural representation and explicit 2D datasets.
Our proposed framework can achieve comparable or even superior performance to the latest multi-view video compression standard MIV.
arXiv Detail & Related papers (2023-11-29T04:15:57Z) - Unsupervised Structure-Consistent Image-to-Image Translation [6.282068591820945]
The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation.
We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers.
The auxiliary module's loss forces the generator to learn to reconstruct an image with an all-zero texture code.
arXiv Detail & Related papers (2022-08-24T13:47:15Z) - Image Inpainting via Conditional Texture and Structure Dual Generation [26.97159780261334]
We propose a novel two-stream network for image inpainting, which models the structure-constrained texture synthesis and texture-guided structure reconstruction.
To enhance the global consistency, a Bi-directional Gated Feature Fusion (Bi-GFF) module is designed to exchange and combine the structure and texture information.
Experiments on the CelebA, Paris StreetView and Places2 datasets demonstrate the superiority of the proposed method.
arXiv Detail & Related papers (2021-08-22T15:44:37Z) - Generating Diverse Structure for Image Inpainting With Hierarchical
VQ-VAE [74.29384873537587]
We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture.
Experimental results on CelebA-HQ, Places2, and ImageNet datasets show that our method not only enhances the diversity of the inpainting solutions but also improves the visual quality of the generated multiple images.
arXiv Detail & Related papers (2021-03-18T05:10:49Z) - Region-adaptive Texture Enhancement for Detailed Person Image Synthesis [86.69934638569815]
RATE-Net is a novel framework for synthesizing person images with sharp texture details.
The proposed framework leverages an additional texture enhancing module to extract appearance information from the source image.
Experiments conducted on DeepFashion benchmark dataset have demonstrated the superiority of our framework compared with existing networks.
arXiv Detail & Related papers (2020-05-26T02:33:21Z) - Towards Analysis-friendly Face Representation with Scalable Feature and
Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way.
Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction.
To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.