Related papers: Learned Image Transmission with Hierarchical Variational Autoencoder

Learned Image Transmission with Hierarchical Variational Autoencoder

URL: http://arxiv.org/abs/2408.16340v3
Date: Tue, 10 Sep 2024 06:35:59 GMT
Title: Learned Image Transmission with Hierarchical Variational Autoencoder
Authors: Guangyi Zhang, Hanlei Li, Yunlong Cai, Qiyu Hu, Guanding Yu, Runmin Zhang,
Abstract summary: We introduce an innovative hierarchical joint source-channel coding (HJSCC) framework for image transmission. Our approach leverages a combination of bottom-up and top-down paths at the transmitter to autoregressively generate multiple hierarchical representations of the original image. Our proposed model outperforms existing baselines in rate-distortion performance and maintains robustness against channel noise.
Score: 28.084648666081943
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce an innovative hierarchical joint source-channel coding (HJSCC) framework for image transmission, utilizing a hierarchical variational autoencoder (VAE). Our approach leverages a combination of bottom-up and top-down paths at the transmitter to autoregressively generate multiple hierarchical representations of the original image. These representations are then directly mapped to channel symbols for transmission by the JSCC encoder. We extend this framework to scenarios with a feedback link, modeling transmission over a noisy channel as a probabilistic sampling process and deriving a novel generative formulation for JSCC with feedback. Compared with existing approaches, our proposed HJSCC provides enhanced adaptability by dynamically adjusting transmission bandwidth, encoding these representations into varying amounts of channel symbols. Extensive experiments on images of varying resolutions demonstrate that our proposed model outperforms existing baselines in rate-distortion performance and maintains robustness against channel noise. The source code will be made available upon acceptance.

Related papers

Joint Source-Channel-Generation Coding: From Distortion-oriented Reconstruction to Semantic-consistent Generation [58.67925548779465]
We propose Joint Source-Channel-Generation Coding (JSCGC), a novel paradigm that shifts the focus from perceptual reconstruction to probabilistic generation.<n>JSCGC improves substantially semantic quality and semantic fidelity, significantly outperforming conventional distortion-oriented J SCC methods.
arXiv Detail & Related papers (2026-01-19T08:12:47Z)
Context Video Semantic Transmission with Variable Length and Rate Coding over MIMO Channels [49.624608869195065]
We propose the context video semantic transmission (CVST) framework for wireless video transmission.<n>We learn a context-channel correlation map to explicitly formulate the relationships between feature groups and multiple input multiple output (MIMO) subchannels.<n>We demonstrate substantial performance gains over various standardized separated coding methods and recent wireless video semantic communication approaches.
arXiv Detail & Related papers (2025-12-23T10:48:43Z)
VQ-DeepISC: Vector Quantized-Enabled Digital Semantic Communication with Channel Adaptive Image Transmission [8.858565507331395]
Discretization of semantic features enables interoperability between semantic and digital communication systems.<n>We propose a vector quantized (VQ)-enabled digital semantic communication system with channel adaptive image transmission.
arXiv Detail & Related papers (2025-08-01T02:35:34Z)
WVSC: Wireless Video Semantic Communication with Multi-frame Compensation [56.63352157833874]
Existing wireless video transmission schemes directly conduct video coding in pixel level. We propose a wireless video semantic communication framework, abbreviated as WVSC, which integrates the idea of semantic communication into wireless video transmission scenarios.
arXiv Detail & Related papers (2025-03-27T06:27:15Z)
SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models [52.40011613324083]
Joint source-channel coding systems (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission. Existing methods focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality. We propose SING, a novel framework that formulates the recovery of high-quality images from corrupted reconstructions as an inverse problem.
arXiv Detail & Related papers (2025-03-16T12:32:11Z)
Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters. We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z)
Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints [66.63250537475973]
This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model.<n>Our experimental results demonstrate significant improvements in pixel-level metrics like peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS)
arXiv Detail & Related papers (2024-07-26T02:34:25Z)
Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission [24.372996233209854]
DiffJSCC is a novel framework that produces high-realism images via the conditional diffusion denoising process. It can achieve highly realistic reconstructions for 768x512 pixel Kodak images with only 3072 symbols.
arXiv Detail & Related papers (2024-04-27T00:12:13Z)
In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model. We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z)
AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT) The proposed ICT can capture both global and local contexts from the latent representations. We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z)
Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images. We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image. The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z)
SNR-adaptive deep joint source-channel coding for wireless image transmission [14.793908797250989]
An autoencoder-based novel deep joint source-channel coding scheme is proposed in this paper. The decoder can estimate the signal-to-noise ratio (SNR) and use it to adaptively decode the transmitted image.
arXiv Detail & Related papers (2021-01-30T10:30:04Z)
Bandwidth-Agile Image Transmission with Deep Joint Source-Channel Coding [7.081604594416339]
We consider the scenario in which images are transmitted progressively in layers over time or frequency. DeepJSCC-$l$ is an innovative solution that uses convolutional autoencoders. DeepJSCC-$l$ has comparable performance with state of the art digital progressive transmission schemes in the challenging low signal-to-noise ratio (SNR) and small bandwidth regimes.
arXiv Detail & Related papers (2020-09-26T00:11:50Z)
Wireless Image Retrieval at the Edge [20.45405359815043]
We study the image retrieval problem at the wireless edge, where an edge device captures an image, which is then used to retrieve similar images from an edge server. Our goal is to maximize the accuracy of the retrieval task under power and bandwidth constraints over the wireless link. We propose two alternative schemes based on digital and analog communications, respectively.
arXiv Detail & Related papers (2020-07-21T16:15:40Z)
An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding. We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern. By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.