Related papers: Task and Perception-aware Distributed Source Coding for Correlated Speech under Bandwidth-constrained Channels

Task and Perception-aware Distributed Source Coding for Correlated Speech under Bandwidth-constrained Channels

URL: http://arxiv.org/abs/2501.17879v1
Date: Mon, 20 Jan 2025 04:57:29 GMT
Title: Task and Perception-aware Distributed Source Coding for Correlated Speech under Bandwidth-constrained Channels
Authors: Sagnik Bhattacharya, Muhammad Ahmed Mohsin, Ahsan Bilal, John M. Cioffi,
Abstract summary: AR/VR applications require real-time transmission of correlated high-fidelity speech from multiple resource-constrained devices over unreliable, bandwidth-limited channels.<n>Existing autoencoder-based speech source coding methods fail to address the combination of the following.<n>We propose a neural distributed principal component analysis (NDPCA)-aided distributed source coding algorithm for correlated speech sources transmitting to a central receiver.
Score: 3.674863913115431
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Emerging wireless AR/VR applications require real-time transmission of correlated high-fidelity speech from multiple resource-constrained devices over unreliable, bandwidth-limited channels. Existing autoencoder-based speech source coding methods fail to address the combination of the following - (1) dynamic bitrate adaptation without retraining the model, (2) leveraging correlations among multiple speech sources, and (3) balancing downstream task loss with realism of reconstructed speech. We propose a neural distributed principal component analysis (NDPCA)-aided distributed source coding algorithm for correlated speech sources transmitting to a central receiver. Our method includes a perception-aware downstream task loss function that balances perceptual realism with task-specific performance. Experiments show significant PSNR improvements under bandwidth constraints over naive autoencoder methods in task-agnostic (19%) and task-aware settings (52%). It also approaches the theoretical upper bound, where all correlated sources are sent to a single encoder, especially in low-bandwidth scenarios. Additionally, we present a rate-distortion-perception trade-off curve, enabling adaptive decisions based on application-specific realism needs.

Related papers

Communication-Efficient Multi-Modal Edge Inference via Uncertainty-Aware Distributed Learning [60.650628083185616]
We propose a three-stage communication-aware distributed learning framework to improve training and inference efficiency.<n>In StageI, devices perform local multi-modal self-supervised learning to obtain shared and modality-specific encoders without device--server exchange.<n>StageII, distributed fine-tuning with centralized evidential fusion calibrates per-modality uncertainty and reliably aggregates features distorted by noise or channel fading.<n>StageIII, an uncertainty-guided feedback mechanism selectively requests additional features for uncertain samples, optimizing the communication--accuracy tradeoff in the distributed setting.
arXiv Detail & Related papers (2026-01-21T12:38:02Z)
Semantic Channel Equalization Strategies for Deep Joint Source-Channel Coding [8.967618587731694]
Deep joint source-channel coding (DeepJSCC) has emerged as a powerful paradigm for end-to-end semantic communications.<n>Existing DeepJSCC schemes assume a shared latent space at transmitter (TX) and receiver (RX)<n>This mismatch introduces "semantic noise", degrading reconstruction quality and downstream task performance.
arXiv Detail & Related papers (2025-10-06T10:29:07Z)
Prediction-Powered Communication with Distortion Guarantees [65.37485275954224]
We study a prediction-powered communication setting, in which devices communicate under zero-delay constraints with strict distortion guarantees.<n>We propose two zero-delay compression algorithms leveraging online conformal prediction to provide per-sequence guarantees on the distortion of reconstructed sequences.<n>Experiments on semantic text compression validate the approach, showing significant bit rate reductions.
arXiv Detail & Related papers (2025-09-29T07:19:39Z)
SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models [52.40011613324083]
Joint source-channel coding systems (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission. Existing methods focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality. We propose SING, a novel framework that formulates the recovery of high-quality images from corrupted reconstructions as an inverse problem.
arXiv Detail & Related papers (2025-03-16T12:32:11Z)
Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks [55.467288506826755]
Federated learning (FL) has been recognized as a viable solution for local-privacy-aware collaborative model training in wireless edge networks.<n>Most existing communication-efficient FL algorithms fail to reduce the significant inter-device variance.<n>We propose a novel communication-efficient FL algorithm, named FedQVR, which relies on a sophisticated variance-reduced scheme.
arXiv Detail & Related papers (2025-01-20T04:26:21Z)
Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints [27.049330099874396]
This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model. Our experimental results demonstrate significant improvements in pixel-level metrics like peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS)
arXiv Detail & Related papers (2024-07-26T02:34:25Z)
Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel Noises [18.539501941328393]
This paper develops a latent diffusion model-enabled SemCom system to handle outliers in source data.<n>A lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter.<n>An end-to-end consistency distillation strategy is used to distill the diffusion models trained in latent space.
arXiv Detail & Related papers (2024-06-09T23:39:31Z)
Collaborative Edge AI Inference over Cloud-RAN [37.3710464868215]
A cloud radio access network (Cloud-RAN) based collaborative edge AI inference architecture is proposed. Specifically, geographically distributed devices capture real-time noise-corrupted sensory data samples and extract the noisy local feature vectors. We allow each RRH receives local feature vectors from all devices over the same resource blocks simultaneously by leveraging an over-the-air computation (AirComp) technique. These aggregated feature vectors are quantized and transmitted to a central processor for further aggregation and downstream inference tasks.
arXiv Detail & Related papers (2024-04-09T04:26:16Z)
Streaming Audio-Visual Speech Recognition with Alignment Regularization [69.30185151873707]
We propose a streaming AV-ASR system based on a hybrid connectionist temporal classification ( CTC)/attention neural network architecture. The proposed AV-ASR model achieves WERs of 2.0% and 2.6% on the Lip Reading Sentences 3 dataset in an offline and online setup.
arXiv Detail & Related papers (2022-11-03T20:20:47Z)
Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques. Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders. We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z)
Learning Resilient Radio Resource Management Policies with Graph Neural Networks [124.89036526192268]
We formulate a resilient radio resource management problem with per-user minimum-capacity constraints. We show that we can parameterize the user selection and power control policies using a finite set of parameters. Thanks to such adaptation, our proposed method achieves a superior tradeoff between the average rate and the 5th percentile rate.
arXiv Detail & Related papers (2022-03-07T19:40:39Z)
Learning Task-Oriented Communication for Edge Inference: An Information Bottleneck Approach [3.983055670167878]
A low-end edge device transmits the extracted feature vector of a local data sample to a powerful edge server for processing. It is critical to encode the data into an informative and compact representation for low-latency inference given the limited bandwidth. We propose a learning-based communication scheme that jointly optimize feature extraction, source coding, and channel coding.
arXiv Detail & Related papers (2021-02-08T12:53:32Z)
Deep Reinforcement Learning for Resource Constrained Multiclass Scheduling in Wireless Networks [0.0]
In our setup, the available limited bandwidth resources are allocated in order to serve randomly arriving service demands. We propose a distributional Deep Deterministic Policy Gradient (DDPG) algorithm combined with Deep Sets to tackle the problem. Our proposed algorithm is tested on both synthetic and real data, showing consistent gains against state-of-the-art conventional methods.
arXiv Detail & Related papers (2020-11-27T09:49:38Z)
Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications. Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)
Infomax Neural Joint Source-Channel Coding via Adversarial Bit Flip [41.28049430114734]
We propose a novel regularization method called Infomax Adversarial-Bit-Flip (IABF) to improve the stability and robustness of the neural joint source-channel coding scheme. Our IABF can achieve state-of-the-art performances on both compression and error correction benchmarks and outperform the baselines by a significant margin.
arXiv Detail & Related papers (2020-04-03T10:00:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.