Related papers: Multi-Modal Multi-Task Semantic Communication: A Distributed Information Bottleneck Perspective

Multi-Modal Multi-Task Semantic Communication: A Distributed Information Bottleneck Perspective

URL: http://arxiv.org/abs/2510.04000v2
Date: Sun, 12 Oct 2025 05:59:05 GMT
Title: Multi-Modal Multi-Task Semantic Communication: A Distributed Information Bottleneck Perspective
Authors: Yujie Zhou, Yiwei Liao, Cheng Peng, Rulong Wang, Yong Xiao, Yingyu Li, Guangming Shi,
Abstract summary: Existing AI-based coding schemes for multi-modal multi-task SemCom often require transmitters with full-modal data to participate in all tasks.<n>We propose PoM$2$-DIB, a novel framework that extends the distributed information bottleneck theory to address this problem.<n>We show that PoM$2$-DIB achieves high inference quality compared to full-participation baselines in various tasks under physical limits.
Score: 38.62986066789684
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic communication (SemCom) shifts the focus from data transmission to meaning delivery, enabling efficient and intelligent communication. Existing AI-based coding schemes for multi-modal multi-task SemCom often require transmitters with full-modal data to participate in all receivers' tasks, which leads to redundant transmissions and conflicts with the physical limits of channel capacity and computational capability. In this paper, we propose PoM$^2$-DIB, a novel framework that extends the distributed information bottleneck (DIB) theory to address this problem. Unlike the typical DIB, this framework introduces modality selection as an additional key design variable, enabling a more flexible tradeoff between communication rate and inference quality. This extension selects only the most relevant modalities for task participation, adhering to the physical constraints, while following efficient DIB-based coding. To optimize selection and coding end-to-end, we relax modality selection into a probabilistic form, allowing the use of score function estimation with common randomness to enable optimizable coordinated decisions across distributed devices. Experimental results on public datasets verify that PoM$^2$-DIB achieves high inference quality compared to full-participation baselines in various tasks under physical limits.

Related papers

Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach [55.861432910722186]
UniToCom is a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission.<n>We propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information.<n>We employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens.
arXiv Detail & Related papers (2025-07-02T14:03:01Z)
Task-Oriented Multimodal Token Transmission in Resource-Constrained Multiuser Networks [19.42660454288912]
We propose a task-oriented multimodal token transmission scheme for efficient multimodal information fusion and utilization.<n>To improve inter-modal consistency and task-relevant token transmission, we design a two-stage training algotithm.<n>We also formulate a weighted-sum optimization problem over latency and inference performance.
arXiv Detail & Related papers (2025-05-06T14:17:05Z)
Take What You Need: Flexible Multi-Task Semantic Communications with Channel Adaptation [51.53221300103261]
This article introduces a novel channel-adaptive and multi-task-aware semantic communication framework based on a masked auto-encoder architecture.<n>A channel-aware extractor is employed to dynamically select relevant information in response to real-time channel conditions.<n> Experimental results demonstrate the superior performance of our framework compared to conventional methods in tasks such as image reconstruction and object detection.
arXiv Detail & Related papers (2025-02-12T09:01:25Z)
FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication [11.254610576923204]
We propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS) Key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.
arXiv Detail & Related papers (2023-10-10T22:23:27Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Multi-Receiver Task-Oriented Communications via Multi-Task Deep Learning [49.83882366499547]
This paper studies task-oriented, otherwise known as goal-oriented, communications in a setting where a transmitter communicates with multiple receivers. A multi-task deep learning approach is presented for joint optimization of completing multiple tasks and communicating with multiple receivers.
arXiv Detail & Related papers (2023-08-14T01:34:34Z)
Communication-Efficient Framework for Distributed Image Semantic Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices. Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator. Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z)
Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data [23.597759255020296]
We propose a distributed multi-modal semantic communication framework incorporating the conventional channel encoder/decoder. We establish a general rate-adaptive coding mechanism for various types of multi-modal semantic tasks. Numerical results show that the proposed mechanism fares better than both conventional communication and existing semantic communication systems.
arXiv Detail & Related papers (2023-05-18T07:31:37Z)
Task-Oriented Sensing, Computation, and Communication Integration for Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC) We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z)
Task-Oriented Communication for Multi-Device Cooperative Edge Inference [14.249444124834719]
cooperative edge inference can overcome the limited sensing capability of a single device, but it substantially increases the communication overhead and may incur excessive latency. We propose a learning-based communication scheme that optimize local feature extraction and distributed feature encoding in a task-oriented manner.
arXiv Detail & Related papers (2021-09-01T03:56:20Z)
Learning Task-Oriented Communication for Edge Inference: An Information Bottleneck Approach [3.983055670167878]
A low-end edge device transmits the extracted feature vector of a local data sample to a powerful edge server for processing. It is critical to encode the data into an informative and compact representation for low-latency inference given the limited bandwidth. We propose a learning-based communication scheme that jointly optimize feature extraction, source coding, and channel coding.
arXiv Detail & Related papers (2021-02-08T12:53:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.