Self-supervised One-Stage Learning for RF-based Multi-Person Pose Estimation
- URL: http://arxiv.org/abs/2506.05420v1
- Date: Thu, 05 Jun 2025 00:40:27 GMT
- Title: Self-supervised One-Stage Learning for RF-based Multi-Person Pose Estimation
- Authors: Seunghwan Shin, Yusung Kim,
- Abstract summary: This paper proposes an efficient and lightweight one-stage MPPE model based on raw RF signals.<n>By sub-grouping RF signals and embedding them using a shared single-layer CNN followed by multi-head attention, this model outperforms previous methods.<n>Our model improves MPPE accuracy by up to 15 in PCKh@0.5 compared to previous methods using raw RF signals.
- Score: 1.4182672294839365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of Multi-Person Pose Estimation (MPPE), Radio Frequency (RF)-based methods can operate effectively regardless of lighting conditions and obscured line-of-sight situations. Existing RF-based MPPE methods typically involve either 1) converting RF signals into heatmap images through complex preprocessing, or 2) applying a deep embedding network directly to raw RF signals. The first approach, while delivering decent performance, is computationally intensive and time-consuming. The second method, though simpler in preprocessing, results in lower MPPE accuracy and generalization performance. This paper proposes an efficient and lightweight one-stage MPPE model based on raw RF signals. By sub-grouping RF signals and embedding them using a shared single-layer CNN followed by multi-head attention, this model outperforms previous methods that embed all signals at once through a large and deep CNN. Additionally, we propose a new self-supervised learning (SSL) method that takes inputs from both one unmasked subgroup and the remaining masked subgroups to predict the latent representations of the masked data. Empirical results demonstrate that our model improves MPPE accuracy by up to 15 in PCKh@0.5 compared to previous methods using raw RF signals. Especially, the proposed SSL method has shown to significantly enhance performance improvements when placed in new locations or in front of obstacles at RF antennas, contributing to greater performance gains as the number of people increases. Our code and dataset is open at Github. https://github.com/sshnan7/SOSPE .
Related papers
- High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution [87.56382172827526]
High-frequency regions are most critical for reconstruction.<n>We propose a training-free adaptive masking module for acceleration.<n>Our method reduces FLOPs by 24--43% for state-of-the-art models.
arXiv Detail & Related papers (2025-05-11T13:18:03Z) - Few-Shot Radar Signal Recognition through Self-Supervised Learning and Radio Frequency Domain Adaptation [48.265859815346985]
Radar signal recognition plays a pivotal role in electronic warfare (EW)<n>Recent advances in deep learning have shown significant potential in improving radar signal recognition.<n>These methods fall short in EW scenarios where annotated radio frequency (RF) data are scarce or impractical to obtain.
arXiv Detail & Related papers (2025-01-07T01:35:56Z) - Residual Channel Boosts Contrastive Learning for Radio Frequency Fingerprint Identification [17.98760668117099]
This paper proposes a residual channel-based data augmentation strategy for Radio Frequency Fingerprint Identification (RFFI)<n>We show that our method significantly enhances both feature extraction ability and generalization while requiring fewer samples and less time.
arXiv Detail & Related papers (2024-12-12T02:48:20Z) - FreSh: Frequency Shifting for Accelerated Neural Representation Learning [11.175745750843484]
Implicit Neural Representations (INRs) have recently gained attention as a powerful approach for continuously representing signals such as images, videos, and 3D shapes using multilayer perceptrons (MLPs)
Low-frequency details are known to exhibit a low-frequency bias, limiting their ability to capture high-frequency details accurately.
We propose frequency shifting (or FreSh) to align the frequency spectrum of the initial output with that of the target signal.
arXiv Detail & Related papers (2024-10-07T14:05:57Z) - Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning [49.275450836604726]
We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training.<n>We employ a two-branch framework empowered by knowledge distillation, enabling the model to take both the filtered and original images as input.
arXiv Detail & Related papers (2024-09-16T15:10:07Z) - LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Disentangled Representation Learning for RF Fingerprint Extraction under
Unknown Channel Statistics [77.13542705329328]
We propose a framework of disentangled representation learning(DRL) that first learns to factor the input signals into a device-relevant component and a device-irrelevant component via adversarial learning.
The implicit data augmentation in the proposed framework imposes a regularization on the RFF extractor to avoid the possible overfitting of device-irrelevant channel statistics.
Experiments validate that the proposed approach, referred to as DR-RFF, outperforms conventional methods in terms of generalizability to unknown complicated propagation environments.
arXiv Detail & Related papers (2022-08-04T15:46:48Z) - New SAR target recognition based on YOLO and very deep multi-canonical
correlation analysis [0.1503974529275767]
This paper proposes a robust feature extraction method for SAR image target classification by adaptively fusing effective features from different CNN layers.
Experiments on the MSTAR dataset demonstrate that the proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T18:10:26Z) - Estimation of Camera Response Function using Prediction Consistency and
Gradual Refinement with an Extension to Deep Learning [42.70498574189067]
Most existing methods for CRF estimation from a single image fail to handle general real images.
We introduce a non-deep-learning method using prediction consistency and gradual refinement.
Our method outperforms the existing single-image methods for daytime and nighttime real images.
arXiv Detail & Related papers (2020-10-08T14:19:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.