Conquering High Packet-Loss Erasure: MoE Swin Transformer-Based Video Semantic Communication
- URL: http://arxiv.org/abs/2508.01205v1
- Date: Sat, 02 Aug 2025 05:41:52 GMT
- Title: Conquering High Packet-Loss Erasure: MoE Swin Transformer-Based Video Semantic Communication
- Authors: Lei Teng, Senran Fan, Chen Dong, Haotai Liang, Zhicheng Bao, Xiaodong Xu, Rui Meng, Ping Zhang,
- Abstract summary: packet-loss-resistant MoE Swin Transformer-based Video Semantic Communication (MSTVSC) system is proposed in this paper.<n>To address this issue, a packet-loss-resistant MoE Swin Transformer-based Video Semantic Communication (MSTVSC) system is proposed in this paper.
- Score: 11.845717685362814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic communication with joint semantic-channel coding robustly transmits diverse data modalities but faces challenges in mitigating semantic information loss due to packet drops in packet-based systems. Under current protocols, packets with errors are discarded, preventing the receiver from utilizing erroneous semantic data for robust decoding. To address this issue, a packet-loss-resistant MoE Swin Transformer-based Video Semantic Communication (MSTVSC) system is proposed in this paper. Semantic vectors are encoded by MSTVSC and transmitted through upper-layer protocol packetization. To investigate the impact of the packetization, a theoretical analysis of the packetization strategy is provided. To mitigate the semantic loss caused by packet loss, a 3D CNN at the receiver recovers missing information using un-lost semantic data and an packet-loss mask matrix. Semantic-level interleaving is employed to reduce concentrated semantic loss from packet drops. To improve compression, a common-individual decomposition approach is adopted, with downsampling applied to individual information to minimize redundancy. The model is lightweighted for practical deployment. Extensive simulations and comparisons demonstrate strong performance, achieving an MS-SSIM greater than 0.6 and a PSNR exceeding 20 dB at a 90% packet loss rate.
Related papers
- Distributed Training under Packet Loss [8.613477072763404]
Leveraging unreliable connections will reduce latency but may sacrifice model accuracy and convergence once packets are dropped.<n>We introduce a principled, end-to-end solution that preserves accuracy and convergence guarantees under genuine packet loss.<n>This work bridges the gap between communication-efficient protocols and the accuracy and guarantees demanded by modern large-model training.
arXiv Detail & Related papers (2025-07-02T11:07:20Z) - MIETT: Multi-Instance Encrypted Traffic Transformer for Encrypted Traffic Classification [59.96233305733875]
Classifying traffic is essential for detecting security threats and optimizing network management.<n>We propose a Multi-Instance Encrypted Traffic Transformer (MIETT) to capture both token-level and packet-level relationships.<n>MIETT achieves results across five datasets, demonstrating its effectiveness in classifying encrypted traffic and understanding complex network behaviors.
arXiv Detail & Related papers (2024-12-19T12:52:53Z) - Synchronous Multi-modal Semantic Communication System with Packet-level Coding [20.397350999784276]
We propose a Synchronous Multimodal Semantic Communication System (SyncSC) with Packet-Level Coding.
To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics.
To protect semantic packets under the erasure channel, we propose a packet-Level Forward Error Correction (FEC) method, called PacSC, that maintains a certain visual quality performance even at high packet loss rates.
arXiv Detail & Related papers (2024-08-08T15:42:00Z) - Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints [66.63250537475973]
This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model.<n>Our experimental results demonstrate significant improvements in pixel-level metrics like peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS)
arXiv Detail & Related papers (2024-07-26T02:34:25Z) - Secure Semantic Communication via Paired Adversarial Residual Networks [59.468221305630784]
This letter explores the positive side of the adversarial attack for the security-aware semantic communication system.
A pair of matching pluggable modules is installed: one after the semantic transmitter and the other before the semantic receiver.
The proposed scheme is capable of fooling the eavesdropper while maintaining the high-quality semantic communication.
arXiv Detail & Related papers (2024-07-02T08:32:20Z) - Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel Noises [18.539501941328393]
This paper develops a latent diffusion model-enabled SemCom system to handle outliers in source data.<n>A lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter.<n>An end-to-end consistency distillation strategy is used to distill the diffusion models trained in latent space.
arXiv Detail & Related papers (2024-06-09T23:39:31Z) - A Transformer-Based Framework for Payload Malware Detection and Classification [0.0]
Techniques such as Deep Packet Inspection (DPI) have been introduced to allow IDSs analyze the content of network packets.
In this paper, we propose a revolutionary DPI algorithm based on transformers adapted for the purpose of detecting malicious traffic.
arXiv Detail & Related papers (2024-03-27T03:25:45Z) - Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - Is Semantic Communications Secure? A Tale of Multi-Domain Adversarial
Attacks [70.51799606279883]
We introduce test-time adversarial attacks on deep neural networks (DNNs) for semantic communications.
We show that it is possible to change the semantics of the transferred information even when the reconstruction loss remains low.
arXiv Detail & Related papers (2022-12-20T17:13:22Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Packet-Loss-Tolerant Split Inference for Delay-Sensitive Deep Learning
in Lossy Wireless Networks [4.932130498861988]
In distributed inference, computational tasks are offloaded from the IoT device to other devices or the edge server via lossy IoT networks.
narrow-band and lossy IoT networks cause non-negligible packet losses and retransmissions, resulting in non-negligible communication latency.
We propose a split inference with no retransmissions (SI-NR) method that achieves high accuracy without any retransmissions, even when packet loss occurs.
arXiv Detail & Related papers (2021-04-28T08:28:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.