Related papers: Synchronous Multi-modal Semantic Communication System with Packet-level Coding

Synchronous Multi-modal Semantic Communication System with Packet-level Coding

URL: http://arxiv.org/abs/2408.04535v2
Date: Sun, 11 Aug 2024 02:37:42 GMT
Title: Synchronous Multi-modal Semantic Communication System with Packet-level Coding
Authors: Yun Tian, Jingkai Ying, Zhijin Qin, Ye Jin, Xiaoming Tao,
Abstract summary: We propose a Synchronous Multimodal Semantic Communication System (SyncSC) with Packet-Level Coding. To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics. To protect semantic packets under the erasure channel, we propose a packet-Level Forward Error Correction (FEC) method, called PacSC, that maintains a certain visual quality performance even at high packet loss rates.
Score: 20.397350999784276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction of multimodal semantics have not been well studied. Due to the independent design of semantic encoders, synchronizing multimodal features in both the semantic and time domains is a challenging problem. In this paper, we take the facial video and speech transmission as an example and propose a Synchronous Multimodal Semantic Communication System (SyncSC) with Packet-Level Coding. To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics, and we propose a semantic codec that achieves similar quality of reconstruction and synchronization with lower bandwidth, compared to traditional methods. To protect semantic packets under the erasure channel, we propose a packet-Level Forward Error Correction (FEC) method, called PacSC, that maintains a certain visual quality performance even at high packet loss rates. Particularly, for text packets, a text packet loss concealment module, called TextPC, based on Bidirectional Encoder Representations from Transformers (BERT) is proposed, which significantly improves the performance of traditional FEC methods. The simulation results show that our proposed SyncSC reduce transmission overhead and achieve high-quality synchronous transmission of video and speech over the packet loss network.

Related papers

Context Video Semantic Transmission with Variable Length and Rate Coding over MIMO Channels [49.624608869195065]
We propose the context video semantic transmission (CVST) framework for wireless video transmission.<n>We learn a context-channel correlation map to explicitly formulate the relationships between feature groups and multiple input multiple output (MIMO) subchannels.<n>We demonstrate substantial performance gains over various standardized separated coding methods and recent wireless video semantic communication approaches.
arXiv Detail & Related papers (2025-12-23T10:48:43Z)
Large Speech Model Enabled Semantic Communication [58.027223937172955]
Large Speech Model enabled Semantic Communication (LargeSC) system.<n>We exploit the rich semantic knowledge embedded in large models and enable adaptive transmission over lossy channels.<n>System supports bandwidths ranging from 550 bps to 2.06 kbps, outperforms conventional baselines in speech quality under high packet loss rates.
arXiv Detail & Related papers (2025-12-04T11:58:08Z)
Channel-Aware Vector Quantization for Robust Semantic Communication on Discrete Channels [5.680520767606761]
We propose a channel-aware vector quantization (CAVQ) algorithm within a joint source-channel coding framework, termed VQJSCC.<n>In this framework, semantic features are discretized and directly mapped to modulation constellation symbols, while CAVQ integrates channel transition probabilities into the quantization process.<n>A multi-codebook alignment mechanism is also introduced to handle mismatches between codebook order and modulation order by decomposing the transmission stream into subchannels.
arXiv Detail & Related papers (2025-10-21T13:02:35Z)
Conquering High Packet-Loss Erasure: MoE Swin Transformer-Based Video Semantic Communication [11.845717685362814]
packet-loss-resistant MoE Swin Transformer-based Video Semantic Communication (MSTVSC) system is proposed in this paper.<n>To address this issue, a packet-loss-resistant MoE Swin Transformer-based Video Semantic Communication (MSTVSC) system is proposed in this paper.
arXiv Detail & Related papers (2025-08-02T05:41:52Z)
WVSC: Wireless Video Semantic Communication with Multi-frame Compensation [56.63352157833874]
Existing wireless video transmission schemes directly conduct video coding in pixel level. We propose a wireless video semantic communication framework, abbreviated as WVSC, which integrates the idea of semantic communication into wireless video transmission scenarios.
arXiv Detail & Related papers (2025-03-27T06:27:15Z)
Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling [81.37449968164692]
We propose Synchronized Coupled Sampling (SynCoS), a novel inference framework that synchronizes denoising paths across the entire video. Our approach combines two complementary sampling strategies, which ensure seamless local transitions and enforce global coherence. Extensive experiments show that SynCoS significantly improves multi-event long video generation, achieving smoother transitions and superior long-range coherence.
arXiv Detail & Related papers (2025-03-11T16:43:45Z)
Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model [55.71885688565501]
We propose a scalable generative video semantic communication framework that extracts and transmits semantic information to achieve high-quality video reconstruction. Specifically, at the transmitter, description and other condition signals are extracted from the source video, functioning as text and structural semantics, respectively. At the receiver, the diffusion-based GenAI large models are utilized to fuse the semantics of the multiple modalities for reconstructing the video.
arXiv Detail & Related papers (2025-02-19T15:59:07Z)
Take What You Need: Flexible Multi-Task Semantic Communications with Channel Adaptation [51.53221300103261]
This article introduces a novel channel-adaptive and multi-task-aware semantic communication framework based on a masked auto-encoder architecture. A channel-aware extractor is employed to dynamically select relevant information in response to real-time channel conditions. Experimental results demonstrate the superior performance of our framework compared to conventional methods in tasks such as image reconstruction and object detection.
arXiv Detail & Related papers (2025-02-12T09:01:25Z)
Cross-Layer Encrypted Semantic Communication Framework for Panoramic Video Transmission [11.438045765196332]
We propose a cross-layer encrypted semantic communication (CLESC) framework for panoramic video transmission. We propose an adaptive cross-layer transmission mechanism that dynamically adjusts CRC, channel coding, and retransmission schemes based on the importance of semantic information. Compared to traditional cross-layer transmission schemes, the CLESC framework can reduce bandwidth consumption by 85%.
arXiv Detail & Related papers (2024-11-19T07:18:38Z)
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing [81.32613443072441]
For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired. We propose a method called Quantized Contrastive Token-Acoustic Pre-training (VQ-CTAP), which uses the cross-modal sequence transcoder to bring text and speech into a joint space.
arXiv Detail & Related papers (2024-08-11T12:24:23Z)
Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z)
Visual Language Model based Cross-modal Semantic Communication Systems [42.321208020228894]
We propose a novel Vision-Language Model-based Cross-modal Semantic Communication system. The VLM-CSC comprises three novel components. The experimental simulations validate the effectiveness, adaptability, and robustness of the CSC system.
arXiv Detail & Related papers (2024-05-06T08:59:16Z)
Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models [43.27015039765803]
We develop a latency-aware semantic communications framework with pre-trained generative models. We demonstrate ultra-low-rate, low-latency, and channel-adaptive semantic communications.
arXiv Detail & Related papers (2024-03-25T23:04:09Z)
Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts [89.04751776308656]
This paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding. In response to security concerns, we introduce the application of covert communications aided by a friendly jammer.
arXiv Detail & Related papers (2023-09-05T23:24:56Z)
Communication-Efficient Framework for Distributed Image Semantic Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices. Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator. Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z)
Enabling the Wireless Metaverse via Semantic Multiverse Communication [82.47169682083806]
Metaverse over wireless networks is an emerging use case of the sixth generation (6G) wireless systems. We propose a novel semantic communication framework by decomposing the metaverse into human/machine agent-specific semantic multiverses (SMs) An SM stored at each agent comprises a semantic encoder and a generator, leveraging recent advances in generative artificial intelligence (AI)
arXiv Detail & Related papers (2022-12-13T21:21:07Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
Wireless Deep Video Semantic Transmission [14.071114007641313]
We propose a new class of high-efficiency deep joint source-channel coding methods to achieve end-to-end video transmission over wireless channels. Our framework is collected under the name deep video semantic transmission (DVST)
arXiv Detail & Related papers (2022-05-26T03:26:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.