Multi-Modal Self-Supervised Semantic Communication
- URL: http://arxiv.org/abs/2503.13940v1
- Date: Tue, 18 Mar 2025 06:13:02 GMT
- Title: Multi-Modal Self-Supervised Semantic Communication
- Authors: Hang Zhao, Hongru Li, Dongfang Xu, Shenghui Song, Khaled B. Letaief,
- Abstract summary: We propose a multi-modal semantic communication system that leverages multi-modal self-supervised learning to enhance task-agnostic feature extraction.<n>The proposed approach effectively captures both modality-invariant and modality-specific features while minimizing training-related communication overhead.<n>The findings underscore the advantages of multi-modal self-supervised learning in semantic communication, paving the way for more efficient and scalable edge inference systems.
- Score: 52.76990720898666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic communication is emerging as a promising paradigm that focuses on the extraction and transmission of semantic meanings using deep learning techniques. While current research primarily addresses the reduction of semantic communication overhead, it often overlooks the training phase, which can incur significant communication costs in dynamic wireless environments. To address this challenge, we propose a multi-modal semantic communication system that leverages multi-modal self-supervised learning to enhance task-agnostic feature extraction. The proposed approach employs self-supervised learning during the pre-training phase to extract task-agnostic semantic features, followed by supervised fine-tuning for downstream tasks. This dual-phase strategy effectively captures both modality-invariant and modality-specific features while minimizing training-related communication overhead. Experimental results on the NYU Depth V2 dataset demonstrate that the proposed method significantly reduces training-related communication overhead while maintaining or exceeding the performance of existing supervised learning approaches. The findings underscore the advantages of multi-modal self-supervised learning in semantic communication, paving the way for more efficient and scalable edge inference systems.
Related papers
- Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition [17.790383360652704]
Training for few-shot multimodal dialogue intention recognition involves two interconnected tasks.<n>This phenomenon is attributed to knowledge interference stemming from the superposition of weight matrix updates during the training process.<n>We propose Knowledge-Decoupled Synergetic Learning, which transforms knowledge into interpretable rules, while applying the post-training of larger models.
arXiv Detail & Related papers (2025-03-06T08:28:44Z) - BMIP: Bi-directional Modality Interaction Prompt Learning for VLM [18.196058385987506]
We propose a novel prompt learning method called $underlinetextbfBi-directional underlinetextbfModality underlinetextbfInteraction underlinetextbfPrompt (BMIP)$.<n>BMIP weights bi-modal information through learning the information of the attention layer, enhancing trainability and inter-modal consistency compared to simple information aggregation methods.
arXiv Detail & Related papers (2025-01-14T00:59:55Z) - Semantic Communication for Cooperative Perception using HARQ [51.148203799109304]
We leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework.
To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies.
We introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ)
arXiv Detail & Related papers (2024-08-29T08:53:26Z) - Semantic Communication for Cooperative Multi-Task Processing over Wireless Networks [8.766411351797885]
We introduce the concept of a "semantic source", allowing multiple semantic interpretations from a single observation.
We formulated an end-to-end optimization problem taking into account the communication channel.
Our findings highlight that cooperative multi-tasking is not always beneficial.
arXiv Detail & Related papers (2024-04-12T14:03:41Z) - Incomplete Multimodal Learning for Remote Sensing Data Fusion [12.822457129596824]
The mechanism of connecting multimodal signals through self-attention operation is a key factor in the success of multimodal Transformer networks in remote sensing data fusion tasks.
Traditional approaches assume access to all modalities during both training and inference, which can lead to severe degradation when dealing with modal-incomplete inputs in downstream applications.
Our proposed approach introduces a novel model for incomplete multimodal learning in the context of remote sensing data fusion.
arXiv Detail & Related papers (2023-04-22T12:16:52Z) - Cognitive Semantic Communication Systems Driven by Knowledge Graph:
Principle, Implementation, and Performance Evaluation [74.38561925376996]
Two cognitive semantic communication frameworks are proposed for the single-user and multiple-user communication scenarios.
An effective semantic correction algorithm is proposed by mining the inference rule from the knowledge graph.
For the multi-user cognitive semantic communication system, a message recovery algorithm is proposed to distinguish messages of different users.
arXiv Detail & Related papers (2023-03-15T12:01:43Z) - Emergent Quantized Communication [34.31732248872158]
We propose an alternative approach to achieve discrete communication -- quantization of communicated messages.
Message quantization allows us to train the model end-to-end, achieving superior performance in multiple setups.
arXiv Detail & Related papers (2022-11-04T12:39:45Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of
Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM)
In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug.
Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z) - Common Language for Goal-Oriented Semantic Communications: A Curriculum
Learning Framework [66.81698651016444]
A comprehensive semantic communications framework is proposed for enabling goal-oriented task execution.
A novel top-down framework that combines curriculum learning (CL) and reinforcement learning (RL) is proposed to solve this problem.
Simulation results show that the proposed CL method outperforms traditional RL in terms of convergence time, task execution time, and transmission cost during training.
arXiv Detail & Related papers (2021-11-15T19:13:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.