Related papers: DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

URL: http://arxiv.org/abs/2601.03112v1
Date: Tue, 06 Jan 2026 15:42:45 GMT
Title: DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations
Authors: Kailin Tan, Jincheng Dai, Sixian Wang, Guo Lu, Shuo Shao, Kai Niu, Wenjun Zhang, Ping Zhang,
Abstract summary: Generative joint source-channel coding (GJSCC) has emerged as a new Deep J SCC paradigm.<n>We propose DiT-JSCC, a novel GJSCC backbone that can jointly learn a semantics-prioritized representation encoder and a diffusion transformer (DiT) based generative decoder.<n>We show that DiT-JSCC consistently outperforms existing J SCC methods in both semantic consistency and visual quality, particularly in extreme regimes.
Score: 32.904008725578606
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative joint source-channel coding (GJSCC) has emerged as a new Deep JSCC paradigm for achieving high-fidelity and robust image transmission under extreme wireless channel conditions, such as ultra-low bandwidth and low signal-to-noise ratio. Recent studies commonly adopt diffusion models as generative decoders, but they frequently produce visually realistic results with limited semantic consistency. This limitation stems from a fundamental mismatch between reconstruction-oriented JSCC encoders and generative decoders, as the former lack explicit semantic discriminability and fail to provide reliable conditional cues. In this paper, we propose DiT-JSCC, a novel GJSCC backbone that can jointly learn a semantics-prioritized representation encoder and a diffusion transformer (DiT) based generative decoder, our open-source project aims to promote the future research in GJSCC. Specifically, we design a semantics-detail dual-branch encoder that aligns naturally with a coarse-to-fine conditional DiT decoder, prioritizing semantic consistency under extreme channel conditions. Moreover, a training-free adaptive bandwidth allocation strategy inspired by Kolmogorov complexity is introduced to further improve the transmission efficiency, thereby indeed redefining the notion of information value in the era of generative decoding. Extensive experiments demonstrate that DiT-JSCC consistently outperforms existing JSCC methods in both semantic consistency and visual quality, particularly in extreme regimes.

Related papers

Joint Source-Channel-Generation Coding: From Distortion-oriented Reconstruction to Semantic-consistent Generation [58.67925548779465]
We propose Joint Source-Channel-Generation Coding (JSCGC), a novel paradigm that shifts the focus from perceptual reconstruction to probabilistic generation.<n>JSCGC improves substantially semantic quality and semantic fidelity, significantly outperforming conventional distortion-oriented J SCC methods.
arXiv Detail & Related papers (2026-01-19T08:12:47Z)
SecDiff: Diffusion-Aided Secure Deep Joint Source-Channel Coding Against Adversarial Attacks [73.41290017870097]
SecDiff is a plug-and-play, diffusion-aided decoding framework.<n>It significantly enhances the security and robustness of deep J SCC under adversarial wireless environments.
arXiv Detail & Related papers (2025-11-03T11:24:06Z)
Semantic Channel Equalization Strategies for Deep Joint Source-Channel Coding [8.967618587731694]
Deep joint source-channel coding (DeepJSCC) has emerged as a powerful paradigm for end-to-end semantic communications.<n>Existing DeepJSCC schemes assume a shared latent space at transmitter (TX) and receiver (RX)<n>This mismatch introduces "semantic noise", degrading reconstruction quality and downstream task performance.
arXiv Detail & Related papers (2025-10-06T10:29:07Z)
Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation [45.347078403677216]
Large-scale models (LSMs) can be an effective framework for semantic representation and understanding.<n>However, their direct deployment is often hindered by high computational complexity and resource requirements.<n>This paper proposes a novel knowledge distillation based semantic communication framework.
arXiv Detail & Related papers (2025-08-04T07:47:18Z)
Channel Fingerprint Construction for Massive MIMO: A Deep Conditional Generative Approach [65.47969413708344]
We introduce the concept of CF twins and design a conditional generative diffusion model (CGDM)<n>We employ a variational inference technique to derive the evidence lower bound (ELBO) for the log-marginal distribution of the observed fine-grained CF conditioned on the coarse-grained CF.<n>We show that the proposed approach exhibits significant improvement in reconstruction performance compared to the baselines.
arXiv Detail & Related papers (2025-05-12T01:36:06Z)
SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models [52.40011613324083]
Joint source-channel coding systems (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission.<n>Existing methods focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality.<n>We propose SING, a novel framework that formulates the recovery of high-quality images from corrupted reconstructions as an inverse problem.
arXiv Detail & Related papers (2025-03-16T12:32:11Z)
Joint Source-Channel Coding: Fundamentals and Recent Progress in Practical Designs [6.059175509501795]
Joint source-channel coding (JSCC) offers an alternative end-to-end approach by optimizing compression and channel coding together. This article provides an overview of the information theoretic foundations of J SCC, surveys practical J SCC designs over the decades, and discusses the reasons for their limited adoption in practical systems.
arXiv Detail & Related papers (2024-09-26T06:10:29Z)
Learned Image Transmission with Hierarchical Variational Autoencoder [28.084648666081943]
We introduce an innovative hierarchical joint source-channel coding (HJSCC) framework for image transmission.<n>Our approach leverages a combination of bottom-up and top-down paths at the transmitter to autoregressively generate multiple hierarchical representations of the original image.<n>Our proposed model outperforms existing baselines in rate-distortion performance and maintains robustness against channel noise.
arXiv Detail & Related papers (2024-08-29T08:23:57Z)
Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints [66.63250537475973]
This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model.<n>Our experimental results demonstrate significant improvements in pixel-level metrics like peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS)
arXiv Detail & Related papers (2024-07-26T02:34:25Z)
Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning. In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling. The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z)
Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z)
Generative Joint Source-Channel Coding for Semantic Image Transmission [29.738666406095074]
Joint source-channel coding (JSCC) schemes using deep neural networks (DNNs) provide promising results in wireless image transmission. We propose two novel J SCC schemes that leverage the perceptual quality of deep generative models (DGMs) for wireless image transmission.
arXiv Detail & Related papers (2022-11-24T19:14:27Z)
Perceptual Learned Source-Channel Coding for High-Fidelity Image Semantic Transmission [7.692038874196345]
In this paper, we introduce adversarial losses to optimize deep J SCC. Our new deep J SCC architecture combines encoder, wireless channel, decoder/generator, and discriminator. A user study confirms that achieving the perceptually similar end-to-end image transmission quality, the proposed method can save about 50% wireless channel bandwidth cost.
arXiv Detail & Related papers (2022-05-26T03:05:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.