Related papers: Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences

Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences

URL: http://arxiv.org/abs/2507.04621v1
Date: Mon, 07 Jul 2025 02:42:35 GMT
Title: Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences
Authors: Yusong Zhang, Yuxuan Sun, Lei Guo, Wei Chen, Bo Ai, Deniz Gunduz,
Abstract summary: 6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications.<n>These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time.<n>This article presents a novel multimodal large language model (MLLM) integrated semantic communications framework, termed MLLM-SC.
Score: 21.082428220672696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications. These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time, which is extremely challenging over resource-limited wireless communication systems. Moreover, a joint understanding of the environment, context, and user intent is essential to deliver task-relevant content effectively. This article presents a novel multimodal large language model (MLLM) integrated semantic communications framework, termed MLLM-SC, which fully leverages reasoning and generative capabilities of pre-trained foundation models for context-aware and task-oriented wireless communication. The MLLM-SC framework adopts a device-edge collaborative architecture. At the edge, MLLM-empowered semantic guidance module analyzes multimodal inputs, user intents, and channel conditions to generate importance-aware attention maps prioritizing semantically critical information. An importance-aware semantic encoder and a resource-adaptive semantic decoder are jointly designed and optimized, which can utilize the semantic guidance for adaptive bandwidth allocation and high-quality content reconstruction or generation. Extensive case studies on visual question answering for AR/VR applications and diffusion-driven image generation validate the effectiveness of MLLM-SC.

Related papers

MGCR-Net:Multimodal Graph-Conditioned Vision-Language Reconstruction Network for Remote Sensing Change Detection [55.702662643521265]
We propose the multimodal graph-conditioned vision-language reconstruction network (MGCR-Net) to explore the semantic interaction capabilities of multimodal data.<n> Experimental results on four public datasets demonstrate that MGCR achieves superior performance compared to mainstream CD methods.
arXiv Detail & Related papers (2025-08-03T02:50:08Z)
Multi-Task Semantic Communications via Large Models [42.42961176008125]
We propose a LAM-based multi-task SemCom architecture, which includes an adaptive model compression strategy and a federated split fine-tuning approach.<n>Retrieval-augmented generation scheme is implemented to synthesize the most recent local and global knowledge bases.
arXiv Detail & Related papers (2025-03-28T00:57:34Z)
SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework [22.924064428134507]
Single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency.<n>We propose a semantic-driven integrated multimodal sensing and communication framework to overcome these challenges.
arXiv Detail & Related papers (2025-03-11T01:04:42Z)
Take What You Need: Flexible Multi-Task Semantic Communications with Channel Adaptation [51.53221300103261]
This article introduces a novel channel-adaptive and multi-task-aware semantic communication framework based on a masked auto-encoder architecture.<n>A channel-aware extractor is employed to dynamically select relevant information in response to real-time channel conditions.<n> Experimental results demonstrate the superior performance of our framework compared to conventional methods in tasks such as image reconstruction and object detection.
arXiv Detail & Related papers (2025-02-12T09:01:25Z)
Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning [41.8826976666953]
We introduce semantic communication into a cellular vehicle-to-everything (C-V2X)-based autonomous vehicle platoon system.<n>The paper proposes a distributed semantic-aware multi-modal resource allocation (SAMRA) algorithm based on multi-agent reinforcement learning (MARL), referred to as SAMRAMARL.
arXiv Detail & Related papers (2024-11-07T12:55:35Z)
Integrating Pre-Trained Language Model with Physical Layer Communications [19.20941153929975]
We introduce a practical ondevice AI communication framework, integrated with physical layer (PHY) communication functions. Our framework incorporates end-to-end training with channel noise to enhance resilience, incorporates vector quantized variational autoencoders (VQ-VAE) for efficient and robust communication, and utilizes pre-trained encoder-decoder transformers for improved generalization capabilities.
arXiv Detail & Related papers (2024-02-18T17:27:51Z)
Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications. Virtual reality (VR) transmission over wireless networks is data- and computation-intensive. We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z)
Large AI Model Empowered Multimodal Semantic Communications [48.73159237649128]
We propose a Large AI Model-based Multimodal SC (LAMMSC) framework. We first present the Conditional-based Multimodal Alignment (MMA) that enables the transformation between multimodal and unimodal data. Then, a personalized LLM-based Knowledge Base (LKB) is proposed, which allows users to perform personalized semantic extraction or recovery. Finally, we apply the Generative adversarial network-based channel Estimation (CGE) for estimating the wireless channel state information.
arXiv Detail & Related papers (2023-09-03T19:24:34Z)
Enabling the Wireless Metaverse via Semantic Multiverse Communication [82.47169682083806]
Metaverse over wireless networks is an emerging use case of the sixth generation (6G) wireless systems. We propose a novel semantic communication framework by decomposing the metaverse into human/machine agent-specific semantic multiverses (SMs) An SM stored at each agent comprises a semantic encoder and a generator, leveraging recent advances in generative artificial intelligence (AI)
arXiv Detail & Related papers (2022-12-13T21:21:07Z)
Communication-Efficient and Distributed Learning Over Wireless Networks: Principles and Applications [55.65768284748698]
Machine learning (ML) is a promising enabler for the fifth generation (5G) communication systems and beyond. This article aims to provide a holistic overview of relevant communication and ML principles, and thereby present communication-efficient and distributed learning frameworks with selected use cases.
arXiv Detail & Related papers (2020-08-06T12:37:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.