Large Speech Model Enabled Semantic Communication
- URL: http://arxiv.org/abs/2512.04711v1
- Date: Thu, 04 Dec 2025 11:58:08 GMT
- Title: Large Speech Model Enabled Semantic Communication
- Authors: Yun Tian, Zhijin Qin, Guocheng Lv, Ye Jin, Kaibin Huang, Zhu Han,
- Abstract summary: Large Speech Model enabled Semantic Communication (LargeSC) system.<n>We exploit the rich semantic knowledge embedded in large models and enable adaptive transmission over lossy channels.<n>System supports bandwidths ranging from 550 bps to 2.06 kbps, outperforms conventional baselines in speech quality under high packet loss rates.
- Score: 58.027223937172955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing speech semantic communication systems mainly based on Joint Source-Channel Coding (JSCC) architectures have demonstrated impressive performance, but their effectiveness remains limited by model structures specifically designed for particular tasks and datasets. Recent advances indicate that generative large models pre-trained on massive datasets, can achieve outstanding performance arexhibit exceptional performance across diverse downstream tasks with minimal fine-tuning. To exploit the rich semantic knowledge embedded in large models and enable adaptive transmission over lossy channels, we propose a Large Speech Model enabled Semantic Communication (LargeSC) system. Simultaneously achieving adaptive compression and robust transmission over lossy channels remains challenging, requiring trade-offs among compression efficiency, speech quality, and latency. In this work, we employ the Mimi as a speech codec, converting speech into discrete tokens compatible with existing network architectures. We propose an adaptive controller module that enables adaptive transmission and in-band Unequal Error Protection (UEP), dynamically adjusting to both speech content and packet loss probability under bandwidth constraints. Additionally, we employ Low-Rank Adaptation (LoRA) to finetune the Moshi foundation model for generative recovery of lost speech tokens. Simulation results show that the proposed system supports bandwidths ranging from 550 bps to 2.06 kbps, outperforms conventional baselines in speech quality under high packet loss rates and achieves an end-to-end latency of approximately 460 ms, thereby demonstrating its potential for real-time deployment.
Related papers
- SemanticNN: Compressive and Error-Resilient Semantic Offloading for Extremely Weak Devices [9.795432423267503]
We propose SemanticNN, a semanticcoder that tolerates bit-level errors in pursuit of semantic-level correctness.<n>It incorporates a Bit Error Rate (BER)-aware decoder that adapts to dynamic channel conditions and a Soft Quantization (SQ)-based encoder to learn compact representations.<n>We conduct extensive experiments on STM32 using three models and six datasets across image classification and object detection tasks.
arXiv Detail & Related papers (2025-11-14T07:47:25Z) - BADiff: Bandwidth Adaptive Diffusion Model [55.10134744772338]
Traditional diffusion models produce high-fidelity images by performing a fixed number of denoising steps, regardless of downstream transmission limitations.<n>In practical cloud-to-device scenarios, limited bandwidth often necessitates heavy compression, leading to loss of fine textures and wasted computation.<n>We introduce a joint end-to-end training strategy where the diffusion model is conditioned on a target quality level derived from the available bandwidth.
arXiv Detail & Related papers (2025-10-24T11:50:03Z) - Distributionally Robust Wireless Semantic Communication with Large AI Models [111.47794569742206]
Current SemCom systems fail to generalize across diverse noise conditions, adversarial attacks, and out-of-distribution data.<n>Wasserstein distributionally robust optimization is employed to provide resilience against semantic misinterpretation and channel perturbations.<n> Experimental results on image and text transmission demonstrate that WaSeCom achieves improved robustness under noise and adversarial perturbations.
arXiv Detail & Related papers (2025-05-28T04:03:57Z) - Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks [12.180483357502293]
This paper proposes a novel framework for real-time adaptivebitrate video streaming by integrating Latent Diffusion Models (LDMs) within the FF techniques.<n>The proposed approach leverages LDMs to compress I-frames into a latent space, offering significant storage and semantic transmission savings.<n>This work opens new possibilities for scalable real-time video streaming in 5G and future post-5G networks.
arXiv Detail & Related papers (2025-02-08T21:14:28Z) - Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints [66.63250537475973]
This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model.<n>Our experimental results demonstrate significant improvements in pixel-level metrics like peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS)
arXiv Detail & Related papers (2024-07-26T02:34:25Z) - Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel Noises [18.539501941328393]
This paper develops a latent diffusion model-enabled SemCom system to handle outliers in source data.<n>A lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter.<n>An end-to-end consistency distillation strategy is used to distill the diffusion models trained in latent space.
arXiv Detail & Related papers (2024-06-09T23:39:31Z) - Diff-GO: Diffusion Goal-Oriented Communications to Achieve Ultra-High
Spectrum Efficiency [46.92279990929111]
This work presents an ultra-efficient communication design by utilizing generative AI-based on diffusion models.
We propose a new low-dimensional noise space for the training of diffusion models, which significantly reduces the communication overhead.
Our experimental results demonstrate that the proposed noise space and the diffusion-based generative model achieve ultra-high spectrum efficiency and accurate recovery of transmitted image signals.
arXiv Detail & Related papers (2023-11-13T17:52:44Z) - Generative AI-aided Joint Training-free Secure Semantic Communications
via Multi-modal Prompts [89.04751776308656]
This paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding.
In response to security concerns, we introduce the application of covert communications aided by a friendly jammer.
arXiv Detail & Related papers (2023-09-05T23:24:56Z) - Toward Adaptive Semantic Communications: Efficient Data Transmission via
Online Learned Nonlinear Transform Source-Channel Coding [11.101344530143303]
We propose an online learned joint source and channel coding approach that leverages the deep learning model's overfitting property.
Specifically, we update the off-the-shelf pre-trained models after deployment in a lightweight online fashion to adapt to the distribution shifts in source data and environment domain.
We take the overfitting concept to the extreme, proposing a series of implementation-friendly methods to adapt the model or representations to an individual data or channel state instance.
arXiv Detail & Related papers (2022-11-08T16:00:27Z) - A Study of Designing Compact Audio-Visual Wake Word Spotting System
Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information.
We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF)
The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.