Related papers: Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing

Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing

URL: http://arxiv.org/abs/2511.18792v1
Date: Mon, 24 Nov 2025 05:46:29 GMT
Title: Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing
Authors: Cheng Jiang, Yihe Yan, Yanxiang Wang, Chun Tung Chou, Wen Hu,
Abstract summary: Wi-Fi sensing offers a compelling, privacy-preserving alternative to cameras.<n>Models trained in one setup fail to generalize to new environments, hardware, or users.<n>Our study pretrains and evaluates models on over 1.3 million samples extracted from 14 datasets.
Score: 2.5495319679484054
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Wi-Fi sensing offers a compelling, privacy-preserving alternative to cameras, its practical utility has been fundamentally undermined by a lack of robustness across domains. Models trained in one setup fail to generalize to new environments, hardware, or users, a critical "domain shift" problem exacerbated by modest, fragmented public datasets. We shift from this limited paradigm and apply a foundation model approach, leveraging Masked Autoencoding (MAE) style pretraining on the largest and most heterogeneous Wi-Fi CSI datasets collection assembled to date. Our study pretrains and evaluates models on over 1.3 million samples extracted from 14 datasets, collected using 4 distinct devices across the 2.4/5/6 GHz bands and bandwidths from 20 to 160 MHz. Our large-scale evaluation is the first to systematically disentangle the impacts of data diversity versus model capacity on cross-domain performance. The results establish scaling trends on Wi-Fi CSI sensing. First, our experiments show log-linear improvements in unseen domain performance as the amount of pretraining data increases, suggesting that data scale and diversity are key to domain generalization. Second, based on the current data volume, larger model can only provide marginal gains for cross-domain performance, indicating that data, rather than model capacity, is the current bottleneck for Wi-Fi sensing generalization. Finally, we conduct a series of cross-domain evaluations on human activity recognition, human gesture recognition and user identification tasks. The results show that the large-scale pretraining improves cross-domain accuracy ranging from 2.2% to 15.7%, compared to the supervised learning baseline. Overall, our findings provide insightful direction for designing future Wi-Fi sensing systems that can eventually be robust enough for real-world deployment.

Related papers

RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z)
Self-Supervised Transformer-based Contrastive Learning for Intrusion Detection Systems [1.1265248232450553]
This paper proposes a self-supervised contrastive learning approach for generalizable intrusion detection on raw packet sequences.<n>Our framework exhibits better performance in comparison to existing NetFlow self-supervised methods.<n>Our model provides a strong baseline for supervised intrusion detection with limited labeled data.
arXiv Detail & Related papers (2025-05-12T13:42:00Z)
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.<n>Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.<n>We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z)
How Important are Data Augmentations to Close the Domain Gap for Object Detection in Orbit? [15.550663626482903]
We investigate the efficacy of data augmentations to close the domain gap in spaceborne computer vision. We propose two novel data augmentations specifically developed to emulate the visual effects observed in orbital imagery.
arXiv Detail & Related papers (2024-10-21T08:24:46Z)
FUSED-Net: Detecting Traffic Signs with Limited Data [2.111102681327218]
We present 'FUSED-Net', built-upon Faster RCNN for traffic sign detection.<n>Unlike traditional approaches, we keep all parameters unfrozen during training, enabling FUSED-Net to learn from limited samples.<n>We achieve 2.4x, 2.2x, 1.5x, and 1.3x improvements of mAP in 1-shot, 3-shot, 5-shot, and 10-shot scenarios, respectively.
arXiv Detail & Related papers (2024-09-23T09:34:42Z)
PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a. Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z)
Data Augmentation Techniques for Cross-Domain WiFi CSI-based Human Activity Recognition [1.7404865362620803]
WiFi Channel State Information (CSI) enables contactless and visual privacy-preserving sensing in indoor environments. Poor model generalization, due to varying environmental conditions and sensing hardware, is a well-known problem in this space. Data augmentation techniques commonly used in image-based learning are applied to WiFi CSI to investigate their effects on model generalization performance.
arXiv Detail & Related papers (2024-01-01T22:27:59Z)
Physical-Layer Semantic-Aware Network for Zero-Shot Wireless Sensing [74.12670841657038]
Device-free wireless sensing has recently attracted significant interest due to its potential to support a wide range of immersive human-machine interactive applications. Data heterogeneity in wireless signals and data privacy regulation of distributed sensing have been considered as the major challenges that hinder the wide applications of wireless sensing in large area networking systems. We propose a novel zero-shot wireless sensing solution that allows models constructed in one or a limited number of locations to be directly transferred to other locations without any labeled data.
arXiv Detail & Related papers (2023-12-08T13:50:30Z)
WiFi-TCN: Temporal Convolution for Human Interaction Recognition based on WiFi signal [4.0773490083614075]
Wi-Fi based human activity recognition has gained considerable interest in recent times. A challenge associated with Wi-Fi-based HAR is the significant decline in performance when the scene or subject changes. We propose a novel approach that leverages a temporal convolution network with augmentations and attention, referred to as TCN-AA.
arXiv Detail & Related papers (2023-05-21T08:37:32Z)
One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z)
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID) Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance. This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z)
CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds [51.47100091540298]
We present Cascaded Primitive Fitting Networks (CPFN) that relies on an adaptive patch sampling network to assemble detection results of global and local primitive detection networks. CPFN improves the state-of-the-art SPFN performance by 13-14% on high-resolution point cloud datasets and specifically improves the detection of fine-scale primitives by 20-22%.
arXiv Detail & Related papers (2021-08-31T23:27:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.