A Multi-Modal Foundational Model for Wireless Communication and Sensing
- URL: http://arxiv.org/abs/2602.04016v1
- Date: Tue, 03 Feb 2026 21:03:23 GMT
- Title: A Multi-Modal Foundational Model for Wireless Communication and Sensing
- Authors: Vahid Yazdnian, Yasaman Ghasempour,
- Abstract summary: This work introduces a task-agnostic, multi-modal foundational model for physical-layer wireless systems.<n>It learns transferable, physics-aware representations across heterogeneous modalities, enabling robust generalization across tasks and environments.<n>Our evaluations demonstrate superior generalization, robustness to deployment shifts, and reduced data requirements compared to task-specific baselines.
- Score: 5.101849923596286
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Artificial intelligence is a key enabler for next-generation wireless communication and sensing. Yet, today's learning-based wireless techniques do not generalize well: most models are task-specific, environment-dependent, and limited to narrow sensing modalities, requiring costly retraining when deployed in new scenarios. This work introduces a task-agnostic, multi-modal foundational model for physical-layer wireless systems that learns transferable, physics-aware representations across heterogeneous modalities, enabling robust generalization across tasks and environments. Our framework employs a physics-guided self-supervised pretraining strategy incorporating a dedicated physical token to capture cross-modal physical correspondences governed by electromagnetic propagation. The learned representations enable efficient adaptation to diverse downstream tasks, including massive multi-antenna optimization, wireless channel estimation, and device localization, using limited labeled data. Our extensive evaluations demonstrate superior generalization, robustness to deployment shifts, and reduced data requirements compared to task-specific baselines.
Related papers
- Optimizing Energy and Data Collection in UAV-aided IoT Networks using Attention-based Multi-Objective Reinforcement Learning [7.900374101465939]
Unmanned Aerial Vehicles (UAVs) are becoming increasingly essential for wireless network services, particularly for data harvesting tasks.<n>Existing algorithms suffer from limited training data, which hampers their performance in highly dynamic environments.<n>We propose an attention-based Multi-Objective Reinforcement Learning architecture that explicitly handles the trade-off between data collection and energy consumption in urban environments.
arXiv Detail & Related papers (2026-01-20T15:55:11Z) - Sensing and Understanding the World over Air: A Large Multimodal Model for Mobile Networks [59.23869884913339]
Wireless-native multi-modal large models (WMLMs) can sense and understand the physical world through multi-modal data.<n>We constructed a GPT-style WMLM model and trained it on a real-world large-scale dataset, leveraging wireless signals as an anchor modality for contrastive learning.
arXiv Detail & Related papers (2025-11-17T07:33:46Z) - MMSense: Adapting Vision-based Foundation Model for Multi-task Multi-modal Wireless Sensing [7.577654996150275]
MMSense is a multi-modal, multi-task foundation model for unified wireless sensing.<n>Our framework integrates image, radar, LiDAR, and textual data by transforming them into vision- compatible representations.<n>A modality gating mecha- nism adaptively fuses these representations, while a vision-based large language model backbone enables unified feature align- ment.
arXiv Detail & Related papers (2025-11-15T17:35:39Z) - Multi-Modal Manipulation via Multi-Modal Policy Consensus [62.49978559936122]
We propose a new approach to integrate diverse sensory modalities for robotic manipulation.<n>Our method factorizes the policy into a set of diffusion models, each specialized for a single representation.<n>We evaluate our approach on simulated manipulation tasks in RLBench, as well as real-world tasks such as occluded object picking, in-hand spoon reorientation, and puzzle insertion.
arXiv Detail & Related papers (2025-09-27T19:43:04Z) - A Wireless Foundation Model for Multi-Task Prediction [50.21098141769079]
We propose a unified foundation model for multi-task prediction in wireless networks that supports diverse prediction intervals.<n>After trained on large-scale datasets, the proposed foundation model demonstrates strong generalization to unseen scenarios and zero-shot performance on new tasks.
arXiv Detail & Related papers (2025-07-08T12:37:55Z) - WirelessGPT: A Generative Pre-trained Multi-task Learning Framework for Wireless Communication [11.9521391877271]
This paper introduces WirelessGPT, a pioneering foundation model specifically designed for multi-task learning in wireless communication and sensing.<n>With an initial parameter size of around 80 million, WirelessGPT demonstrates significant improvements over conventional methods and smaller AI models.<n>As the first foundation model capable of supporting diverse tasks across different domains, WirelessGPT establishes a new benchmark.
arXiv Detail & Related papers (2025-02-08T12:38:56Z) - Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Personalized Wireless Federated Learning for Large Language Models [75.22457544349668]
Large language models (LLMs) have driven profound transformations in wireless networks.<n>Within wireless environments, the training of LLMs faces significant challenges related to security and privacy.<n>This paper presents a systematic analysis of the training stages of LLMs in wireless networks, including pre-training, instruction tuning, and alignment tuning.
arXiv Detail & Related papers (2024-04-20T02:30:21Z) - Physical-Layer Semantic-Aware Network for Zero-Shot Wireless Sensing [74.12670841657038]
Device-free wireless sensing has recently attracted significant interest due to its potential to support a wide range of immersive human-machine interactive applications.
Data heterogeneity in wireless signals and data privacy regulation of distributed sensing have been considered as the major challenges that hinder the wide applications of wireless sensing in large area networking systems.
We propose a novel zero-shot wireless sensing solution that allows models constructed in one or a limited number of locations to be directly transferred to other locations without any labeled data.
arXiv Detail & Related papers (2023-12-08T13:50:30Z) - Pervasive Machine Learning for Smart Radio Environments Enabled by
Reconfigurable Intelligent Surfaces [56.35676570414731]
The emerging technology of Reconfigurable Intelligent Surfaces (RISs) is provisioned as an enabler of smart wireless environments.
RISs offer a highly scalable, low-cost, hardware-efficient, and almost energy-neutral solution for dynamic control of the propagation of electromagnetic signals over the wireless medium.
One of the major challenges with the envisioned dense deployment of RISs in such reconfigurable radio environments is the efficient configuration of multiple metasurfaces.
arXiv Detail & Related papers (2022-05-08T06:21:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.