Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences
- URL: http://arxiv.org/abs/2507.04621v1
- Date: Mon, 07 Jul 2025 02:42:35 GMT
- Title: Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences
- Authors: Yusong Zhang, Yuxuan Sun, Lei Guo, Wei Chen, Bo Ai, Deniz Gunduz,
- Abstract summary: 6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications.<n>These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time.<n>This article presents a novel multimodal large language model (MLLM) integrated semantic communications framework, termed MLLM-SC.
- Score: 21.082428220672696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications. These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time, which is extremely challenging over resource-limited wireless communication systems. Moreover, a joint understanding of the environment, context, and user intent is essential to deliver task-relevant content effectively. This article presents a novel multimodal large language model (MLLM) integrated semantic communications framework, termed MLLM-SC, which fully leverages reasoning and generative capabilities of pre-trained foundation models for context-aware and task-oriented wireless communication. The MLLM-SC framework adopts a device-edge collaborative architecture. At the edge, MLLM-empowered semantic guidance module analyzes multimodal inputs, user intents, and channel conditions to generate importance-aware attention maps prioritizing semantically critical information. An importance-aware semantic encoder and a resource-adaptive semantic decoder are jointly designed and optimized, which can utilize the semantic guidance for adaptive bandwidth allocation and high-quality content reconstruction or generation. Extensive case studies on visual question answering for AR/VR applications and diffusion-driven image generation validate the effectiveness of MLLM-SC.
Related papers
- MGCR-Net:Multimodal Graph-Conditioned Vision-Language Reconstruction Network for Remote Sensing Change Detection [55.702662643521265]
We propose the multimodal graph-conditioned vision-language reconstruction network (MGCR-Net) to explore the semantic interaction capabilities of multimodal data.<n> Experimental results on four public datasets demonstrate that MGCR achieves superior performance compared to mainstream CD methods.
arXiv Detail & Related papers (2025-08-03T02:50:08Z) - Multi-Task Semantic Communications via Large Models [42.42961176008125]
We propose a LAM-based multi-task SemCom architecture, which includes an adaptive model compression strategy and a federated split fine-tuning approach.<n>Retrieval-augmented generation scheme is implemented to synthesize the most recent local and global knowledge bases.
arXiv Detail & Related papers (2025-03-28T00:57:34Z) - SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework [22.924064428134507]
Single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency.<n>We propose a semantic-driven integrated multimodal sensing and communication framework to overcome these challenges.
arXiv Detail & Related papers (2025-03-11T01:04:42Z) - Take What You Need: Flexible Multi-Task Semantic Communications with Channel Adaptation [51.53221300103261]
This article introduces a novel channel-adaptive and multi-task-aware semantic communication framework based on a masked auto-encoder architecture.<n>A channel-aware extractor is employed to dynamically select relevant information in response to real-time channel conditions.<n> Experimental results demonstrate the superior performance of our framework compared to conventional methods in tasks such as image reconstruction and object detection.
arXiv Detail & Related papers (2025-02-12T09:01:25Z) - Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning [41.8826976666953]
We introduce semantic communication into a cellular vehicle-to-everything (C-V2X)-based autonomous vehicle platoon system.<n>The paper proposes a distributed semantic-aware multi-modal resource allocation (SAMRA) algorithm based on multi-agent reinforcement learning (MARL), referred to as SAMRAMARL.
arXiv Detail & Related papers (2024-11-07T12:55:35Z) - Integrating Pre-Trained Language Model with Physical Layer Communications [19.20941153929975]
We introduce a practical ondevice AI communication framework, integrated with physical layer (PHY) communication functions.
Our framework incorporates end-to-end training with channel noise to enhance resilience, incorporates vector quantized variational autoencoders (VQ-VAE) for efficient and robust communication, and utilizes pre-trained encoder-decoder transformers for improved generalization capabilities.
arXiv Detail & Related papers (2024-02-18T17:27:51Z) - Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications.
Virtual reality (VR) transmission over wireless networks is data- and computation-intensive.
We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z) - Large AI Model Empowered Multimodal Semantic Communications [48.73159237649128]
We propose a Large AI Model-based Multimodal SC (LAMMSC) framework.
We first present the Conditional-based Multimodal Alignment (MMA) that enables the transformation between multimodal and unimodal data.
Then, a personalized LLM-based Knowledge Base (LKB) is proposed, which allows users to perform personalized semantic extraction or recovery.
Finally, we apply the Generative adversarial network-based channel Estimation (CGE) for estimating the wireless channel state information.
arXiv Detail & Related papers (2023-09-03T19:24:34Z) - Enabling the Wireless Metaverse via Semantic Multiverse Communication [82.47169682083806]
Metaverse over wireless networks is an emerging use case of the sixth generation (6G) wireless systems.
We propose a novel semantic communication framework by decomposing the metaverse into human/machine agent-specific semantic multiverses (SMs)
An SM stored at each agent comprises a semantic encoder and a generator, leveraging recent advances in generative artificial intelligence (AI)
arXiv Detail & Related papers (2022-12-13T21:21:07Z) - Communication-Efficient and Distributed Learning Over Wireless Networks:
Principles and Applications [55.65768284748698]
Machine learning (ML) is a promising enabler for the fifth generation (5G) communication systems and beyond.
This article aims to provide a holistic overview of relevant communication and ML principles, and thereby present communication-efficient and distributed learning frameworks with selected use cases.
arXiv Detail & Related papers (2020-08-06T12:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.