Multi-Modal Semantic Communication
- URL: http://arxiv.org/abs/2512.15691v1
- Date: Wed, 17 Dec 2025 18:47:22 GMT
- Title: Multi-Modal Semantic Communication
- Authors: Matin Mortaheb, Erciyes Karakaya, Sennur Ulukus,
- Abstract summary: We propose a novel Multi-Modal Semantic Communication framework that integrates text-based user queries to guide the information extraction process.<n>Our proposed system employs a cross-modal attention mechanism that fuses visual features with language embeddings to produce soft relevance scores.<n>At the receiver, the patches are reconstructed and combined to preserve taskcritical information.
- Score: 39.55262791529245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic communication aims to transmit information most relevant to a task rather than raw data, offering significant gains in communication efficiency for applications such as telepresence, augmented reality, and remote sensing. Recent transformer-based approaches have used self-attention maps to identify informative regions within images, but they often struggle in complex scenes with multiple objects, where self-attention lacks explicit task guidance. To address this, we propose a novel Multi-Modal Semantic Communication framework that integrates text-based user queries to guide the information extraction process. Our proposed system employs a cross-modal attention mechanism that fuses visual features with language embeddings to produce soft relevance scores over the visual data. Based on these scores and the instantaneous channel bandwidth, we use an algorithm to transmit image patches at adaptive resolutions using independently trained encoder-decoder pairs, with total bitrate matching the channel capacity. At the receiver, the patches are reconstructed and combined to preserve task-critical information. This flexible and goal-driven design enables efficient semantic communication in complex and bandwidth-constrained environments.
Related papers
- Referring Remote Sensing Image Segmentation with Cross-view Semantics Interaction Network [65.01521002836611]
We propose a paralleled yet unified segmentation framework Cross-view Semantics Interaction Network (CSINet) to solve the limitations.<n>Motivated by human behavior in observing targets of interest, the network orchestrates visual cues from remote and close distances to conduct synergistic prediction.<n>In its every encoding stage, a Cross-View Window-attention module (CVWin) is utilized to supplement global and local semantics into close-view and remote-view branch features.
arXiv Detail & Related papers (2025-08-02T11:57:56Z) - Task-Adaptive Semantic Communications with Controllable Diffusion-based Data Regeneration [45.55410059471241]
Next-generation networking shifts bit-wise data delivery to conveying semantic meanings for bandwidth efficiency.<n>This work presents a novel task-adaptive semantic communication framework based on diffusion models.<n>Test results demonstrate the efficacy of the proposed method in adaptively preserving task-relevant information for semantic communications.
arXiv Detail & Related papers (2025-05-12T18:23:53Z) - Take What You Need: Flexible Multi-Task Semantic Communications with Channel Adaptation [51.53221300103261]
This article introduces a novel channel-adaptive and multi-task-aware semantic communication framework based on a masked auto-encoder architecture.<n>A channel-aware extractor is employed to dynamically select relevant information in response to real-time channel conditions.<n> Experimental results demonstrate the superior performance of our framework compared to conventional methods in tasks such as image reconstruction and object detection.
arXiv Detail & Related papers (2025-02-12T09:01:25Z) - Efficient Semantic Communication Through Transformer-Aided Compression [31.285983939625098]
We introduce a channel-aware adaptive framework for semantic communication.<n>By employing vision transformers, we interpret the attention mask as a measure of the semantic contents of the patches.<n>Our method enhances communication efficiency by adapting the encoding resolution to the content's relevance.
arXiv Detail & Related papers (2024-12-02T18:57:28Z) - Toward Real-Time Edge AI: Model-Agnostic Task-Oriented Communication with Visual Feature Alignment [23.796344455232227]
Task-oriented communication presents a promising approach to improve the communication efficiency of edge inference systems.<n>Real-time applications face practical challenges, such as incomplete coverage and potential malfunctions of edge servers.<n>This study introduces a novel framework that utilizes shared anchor data across diverse systems.
arXiv Detail & Related papers (2024-12-01T15:52:05Z) - Editable-DeepSC: Reliable Cross-Modal Semantic Communications for Facial Editing [56.136971047286956]
We propose Editable-DeepSC, a novel cross-modal semantic communication approach for facial editing.<n>Experiments indicate that Editable-DeepSC can achieve superior editings while significantly saving the transmission bandwidth.
arXiv Detail & Related papers (2024-11-24T04:07:33Z) - Transformer-Aided Semantic Communications [28.63893944806149]
We employ vision transformers specifically for the purpose of compression and compact representation of the input image.
Through the use of the attention mechanism inherent in transformers, we create an attention mask.
We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset.
arXiv Detail & Related papers (2024-05-02T17:50:53Z) - Communication-Efficient Framework for Distributed Image Semantic
Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices.
Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator.
Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z) - Semantic-Native Communication: A Simplicial Complex Perspective [50.099494681671224]
We study semantic communication from a topological space perspective.
A transmitter first maps its data into a $k$-order simplicial complex and then learns its high-order correlations.
The receiver decodes the structure and infers the missing or distorted data.
arXiv Detail & Related papers (2022-10-30T22:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.