Related papers: A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning

A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning

URL: http://arxiv.org/abs/2505.09160v2
Date: Wed, 22 Oct 2025 00:52:25 GMT
Title: A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning
Authors: Berkay Guler, Giovanni Geraci, Hamid Jafarkhani,
Abstract summary: ContraWiMAE is a transformer-based foundation model that unifies masked reconstruction and masked contrastive learning for wireless channel representation.<n>Our key innovation is a new wireless-inspired contrastive objective that exploits the inherent characteristics of wireless environment, including noise, fading, and partial observability, as natural augmentation.
Score: 19.277001743060435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current applications of self-supervised learning to wireless channel representation often borrow paradigms developed for text and image processing, without fully addressing the unique characteristics and constraints of wireless communications. To bridge this gap, we introduce ContraWiMAE, Wireless Contrastive Masked Autoencoder, a transformer-based foundation model that unifies masked reconstruction and masked contrastive learning for wireless channel representation. Our key innovation is a new wireless-inspired contrastive objective that exploits the inherent characteristics of wireless environment, including noise, fading, and partial observability, as natural augmentation. Through extensive evaluation on unseen scenarios and conditions, we demonstrate our method's effectiveness in multiple downstream tasks, including cross-frequency beam selection, line-of-sight detection, and channel estimation. ContraWiMAE exhibits superior linear separability and adaptability in diverse wireless environments, demonstrating exceptional data efficiency and competitive performance compared with supervised baselines under challenging conditions. Comparative evaluations against a state-of-the-art wireless channel foundation model confirm the superior performance and data efficiency of our approach, highlighting its potential as a powerful baseline for future research in self-supervised wireless channel representation learning. To foster further work in this direction, we release the model weights and training pipeline for ContraWiMAE.

Related papers

A Multi-Modal Foundational Model for Wireless Communication and Sensing [5.101849923596286]
This work introduces a task-agnostic, multi-modal foundational model for physical-layer wireless systems.<n>It learns transferable, physics-aware representations across heterogeneous modalities, enabling robust generalization across tasks and environments.<n>Our evaluations demonstrate superior generalization, robustness to deployment shifts, and reduced data requirements compared to task-specific baselines.
arXiv Detail & Related papers (2026-02-03T21:03:23Z)
MMSense: Adapting Vision-based Foundation Model for Multi-task Multi-modal Wireless Sensing [7.577654996150275]
MMSense is a multi-modal, multi-task foundation model for unified wireless sensing.<n>Our framework integrates image, radar, LiDAR, and textual data by transforming them into vision- compatible representations.<n>A modality gating mecha- nism adaptively fuses these representations, while a vision-based large language model backbone enables unified feature align- ment.
arXiv Detail & Related papers (2025-11-15T17:35:39Z)
SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations [68.9300049150948]
We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID)<n>Existing data collection approaches yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions.<n>We present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood.
arXiv Detail & Related papers (2025-05-04T13:00:29Z)
A MIMO Wireless Channel Foundation Model via CIR-CSI Consistency [19.658024410165112]
This paper treats Channel State Information (CSI) and Channel Impulse Response (CIR) as naturally aligned multi-modal data.<n>By effectively capturing the joint representations of both CIR and CSI, CSI-CLIP exhibits remarkable adaptability across scenarios.
arXiv Detail & Related papers (2025-02-17T16:13:40Z)
WirelessGPT: A Generative Pre-trained Multi-task Learning Framework for Wireless Communication [11.9521391877271]
This paper introduces WirelessGPT, a pioneering foundation model specifically designed for multi-task learning in wireless communication and sensing.<n>With an initial parameter size of around 80 million, WirelessGPT demonstrates significant improvements over conventional methods and smaller AI models.<n>As the first foundation model capable of supporting diverse tasks across different domains, WirelessGPT establishes a new benchmark.
arXiv Detail & Related papers (2025-02-08T12:38:56Z)
LaVin-DiT: Large Vision Diffusion Transformer [99.98106406059333]
LaVin-DiT is a scalable and unified foundation model designed to tackle over 20 computer vision tasks in a generative framework.<n>We introduce key innovations to optimize generative performance for vision tasks.<n>The model is scaled from 0.1B to 3.4B parameters, demonstrating substantial scalability and state-of-the-art performance across diverse vision tasks.
arXiv Detail & Related papers (2024-11-18T12:05:27Z)
Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning. In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling. The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z)
Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications. Virtual reality (VR) transmission over wireless networks is data- and computation-intensive. We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z)
Physical-Layer Semantic-Aware Network for Zero-Shot Wireless Sensing [74.12670841657038]
Device-free wireless sensing has recently attracted significant interest due to its potential to support a wide range of immersive human-machine interactive applications. Data heterogeneity in wireless signals and data privacy regulation of distributed sensing have been considered as the major challenges that hinder the wide applications of wireless sensing in large area networking systems. We propose a novel zero-shot wireless sensing solution that allows models constructed in one or a limited number of locations to be directly transferred to other locations without any labeled data.
arXiv Detail & Related papers (2023-12-08T13:50:30Z)
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks. We find that their performances are sub-optimal or even lag far behind the single-task baseline. We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z)
Multi-Agent Feedback Enabled Neural Networks for Intelligent Communications [28.723523146324002]
In this paper, a novel multi-agent feedback enabled neural network (MAFENN) framework is proposed. The MAFENN framework is theoretically formulated into a three-player Feedback Stackelberg game, and the game is proved to converge to the Feedback Stackelberg equilibrium. To verify the MAFENN framework's feasibility in wireless communications, a multi-agent MAFENN based equalizer (MAFENN-E) is developed.
arXiv Detail & Related papers (2022-05-22T05:28:43Z)
Model-based Deep Learning Receiver Design for Rate-Splitting Multiple Access [65.21117658030235]
This work proposes a novel design for a practical RSMA receiver based on model-based deep learning (MBDL) methods. The MBDL receiver is evaluated in terms of uncoded Symbol Error Rate (SER), throughput performance through Link-Level Simulations (LLS) and average training overhead. Results reveal that the MBDL outperforms by a significant margin the SIC receiver with imperfect CSIR.
arXiv Detail & Related papers (2022-05-02T12:23:55Z)
Multi-task Learning Approach for Modulation and Wireless Signal Classification for 5G and Beyond: Edge Deployment via Model Compression [1.218340575383456]
Future communication networks must address the scarce spectrum to accommodate growth of heterogeneous wireless devices. We exploit the potential of deep neural networks based multi-task learning framework to simultaneously learn modulation and signal classification tasks. We provide a comprehensive heterogeneous wireless signals dataset for public use.
arXiv Detail & Related papers (2022-02-26T14:51:02Z)
ChaRRNets: Channel Robust Representation Networks for RF Fingerprinting [0.0]
We present complex-valued Convolutional Neural Networks (CNNs) for RF fingerprinting. We focus on the problem of fingerprinting wireless IoT devices in-the-wild using Deep Learning (DL) techniques.
arXiv Detail & Related papers (2021-05-08T03:03:21Z)
Distributed Learning in Wireless Networks: Recent Progress and Future Challenges [170.35951727508225]
Next-generation wireless networks will enable many machine learning (ML) tools and applications to analyze various types of data collected by edge devices. Distributed learning and inference techniques have been proposed as a means to enable edge devices to collaboratively train ML models without raw data exchanges. This paper provides a comprehensive study of how distributed learning can be efficiently and effectively deployed over wireless edge networks.
arXiv Detail & Related papers (2021-04-05T20:57:56Z)
Intelligent Reflecting Surface Aided Wireless Communications: A Tutorial [64.77665786141166]
Intelligent reflecting surface (IRS) is an enabling technology to engineer the radio signal prorogation in wireless networks. IRS is capable of dynamically altering wireless channels to enhance the communication performance. Despite its great potential, IRS faces new challenges to be efficiently integrated into wireless networks.
arXiv Detail & Related papers (2020-07-06T13:59:09Z)
Harnessing Wireless Channels for Scalable and Privacy-Preserving Federated Learning [56.94644428312295]
Wireless connectivity is instrumental in enabling federated learning (FL) Channel randomnessperturbs each worker inversions model update while multiple workers updates incur significant interference on bandwidth. In A-FADMM, all workers upload their model updates to the parameter server using a single channel via analog transmissions. This not only saves communication bandwidth, but also hides each worker's exact model update trajectory from any eavesdropper.
arXiv Detail & Related papers (2020-07-03T16:31:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.