Learned Image Transmission with Hierarchical Variational Autoencoder
- URL: http://arxiv.org/abs/2408.16340v3
- Date: Tue, 10 Sep 2024 06:35:59 GMT
- Title: Learned Image Transmission with Hierarchical Variational Autoencoder
- Authors: Guangyi Zhang, Hanlei Li, Yunlong Cai, Qiyu Hu, Guanding Yu, Runmin Zhang,
- Abstract summary: We introduce an innovative hierarchical joint source-channel coding (HJSCC) framework for image transmission.
Our approach leverages a combination of bottom-up and top-down paths at the transmitter to autoregressively generate multiple hierarchical representations of the original image.
Our proposed model outperforms existing baselines in rate-distortion performance and maintains robustness against channel noise.
- Score: 28.084648666081943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce an innovative hierarchical joint source-channel coding (HJSCC) framework for image transmission, utilizing a hierarchical variational autoencoder (VAE). Our approach leverages a combination of bottom-up and top-down paths at the transmitter to autoregressively generate multiple hierarchical representations of the original image. These representations are then directly mapped to channel symbols for transmission by the JSCC encoder. We extend this framework to scenarios with a feedback link, modeling transmission over a noisy channel as a probabilistic sampling process and deriving a novel generative formulation for JSCC with feedback. Compared with existing approaches, our proposed HJSCC provides enhanced adaptability by dynamically adjusting transmission bandwidth, encoding these representations into varying amounts of channel symbols. Extensive experiments on images of varying resolutions demonstrate that our proposed model outperforms existing baselines in rate-distortion performance and maintains robustness against channel noise. The source code will be made available upon acceptance.
Related papers
- Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission [24.372996233209854]
DiffJSCC is a novel framework that produces high-realism images via the conditional diffusion denoising process.
It can achieve highly realistic reconstructions for 768x512 pixel Kodak images with only 3072 symbols.
arXiv Detail & Related papers (2024-04-27T00:12:13Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT)
The proposed ICT can capture both global and local contexts from the latent representations.
We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - SpectralFormer: Rethinking Hyperspectral Image Classification with
Transformers [91.09957836250209]
Hyperspectral (HS) images are characterized by approximately contiguous spectral information.
CNNs have been proven to be a powerful feature extractor in HS image classification.
We propose a novel backbone network called ulSpectralFormer for HS image classification.
arXiv Detail & Related papers (2021-07-07T02:59:21Z) - SNR-adaptive deep joint source-channel coding for wireless image
transmission [14.793908797250989]
An autoencoder-based novel deep joint source-channel coding scheme is proposed in this paper.
The decoder can estimate the signal-to-noise ratio (SNR) and use it to adaptively decode the transmitted image.
arXiv Detail & Related papers (2021-01-30T10:30:04Z) - Bandwidth-Agile Image Transmission with Deep Joint Source-Channel Coding [7.081604594416339]
We consider the scenario in which images are transmitted progressively in layers over time or frequency.
DeepJSCC-$l$ is an innovative solution that uses convolutional autoencoders.
DeepJSCC-$l$ has comparable performance with state of the art digital progressive transmission schemes in the challenging low signal-to-noise ratio (SNR) and small bandwidth regimes.
arXiv Detail & Related papers (2020-09-26T00:11:50Z) - Wireless Image Retrieval at the Edge [20.45405359815043]
We study the image retrieval problem at the wireless edge, where an edge device captures an image, which is then used to retrieve similar images from an edge server.
Our goal is to maximize the accuracy of the retrieval task under power and bandwidth constraints over the wireless link.
We propose two alternative schemes based on digital and analog communications, respectively.
arXiv Detail & Related papers (2020-07-21T16:15:40Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.