SUPER: Seated Upper Body Pose Estimation using mmWave Radars
- URL: http://arxiv.org/abs/2407.02455v1
- Date: Tue, 2 Jul 2024 17:32:34 GMT
- Title: SUPER: Seated Upper Body Pose Estimation using mmWave Radars
- Authors: Bo Zhang, Zimeng Zhou, Boyu Jiang, Rong Zheng,
- Abstract summary: Super is a framework for seated upper body human pose estimation that utilizes dual-mmWave radars in close proximity.
A lightweight neural network extracts both global and local features of upper body and output pose parameters for the Skinned Multi-Person Linear (SMPL) model.
- Score: 6.205521056622584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In industrial countries, adults spend a considerable amount of time sedentary each day at work, driving and during activities of daily living. Characterizing the seated upper body human poses using mmWave radars is an important, yet under-studied topic with many applications in human-machine interaction, transportation and road safety. In this work, we devise SUPER, a framework for seated upper body human pose estimation that utilizes dual-mmWave radars in close proximity. A novel masking algorithm is proposed to coherently fuse data from the radars to generate intensity and Doppler point clouds with complementary information for high-motion but small radar cross section areas (e.g., upper extremities) and low-motion but large RCS areas (e.g. torso). A lightweight neural network extracts both global and local features of upper body and output pose parameters for the Skinned Multi-Person Linear (SMPL) model. Extensive leave-one-subject-out experiments on various motion sequences from multiple subjects show that SUPER outperforms a state-of-the-art baseline method by 30 -- 184%. We also demonstrate its utility in a simple downstream task for hand-object interaction.
Related papers
- RadarGen: Automotive Radar Point Cloud Generation from Cameras [64.69976771710057]
We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery.<n>RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird's-eye-view form.<n>We show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data.
arXiv Detail & Related papers (2025-12-19T18:57:33Z) - M4Human: A Large-Scale Multimodal mmWave Radar Benchmark for Human Mesh Reconstruction [45.65814462778685]
M4Human is the current largest-scale (661K-frame) multimodal benchmark, featuring high-resolution mmWave radar, RGB, and depth data.<n>M4Human includes high-quality motion capture (MoCap) annotations with 3D meshes and global trajectories.<n>We establish benchmarks on both RT and RPC modalities, as well as multimodal fusion with RGB-D modalities.
arXiv Detail & Related papers (2025-12-13T16:08:59Z) - TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion [54.46664104437454]
We propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion.
Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed.
Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%.
arXiv Detail & Related papers (2025-04-16T05:25:04Z) - RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence [10.115852646162843]
We present Radar-LLM, the first framework that leverages large language models (LLMs) for human understanding using millimeter-wave radar as the sensing modality.
To address data scarcity, we introduce a physics-aware pipeline synthesis that generates realistic radar-text pairs from motion-text datasets.
Radar-LLM achieves state-of-the-art performance across both synthetic and real-world benchmarks, enabling accurate translation of millimeter-wave signals to natural language descriptions.
arXiv Detail & Related papers (2025-04-14T04:18:25Z) - Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers.
Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements.
We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z) - Multi-stage Learning for Radar Pulse Activity Segmentation [51.781832424705094]
Radio signal recognition is a crucial function in electronic warfare.
Precise identification and localisation of radar pulse activities are required by electronic warfare systems.
Deep learning-based radar pulse activity recognition methods have remained largely underexplored.
arXiv Detail & Related papers (2023-12-15T01:56:27Z) - Echoes Beyond Points: Unleashing the Power of Raw Radar Data in
Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline.
Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data [33.457104508061015]
Scene understanding is crucial for autonomous robots in dynamic environments for making future state predictions, avoiding collisions, and path planning.
Camera and LiDAR perception made tremendous progress in recent years, but face limitations under adverse weather conditions.
To leverage the full potential of multi-modal sensor suites, radar sensors are essential for safety critical tasks and are already installed in most new vehicles today.
arXiv Detail & Related papers (2022-12-07T15:05:03Z) - HuPR: A Benchmark for Human Pose Estimation Using Millimeter Wave Radar [30.51398364813315]
This paper introduces a novel human pose estimation benchmark, Human Pose with Millimeter Wave Radar (HuPR)
This dataset is created using cross-calibrated mmWave radar sensors and a monocular RGB camera for cross-modality training of radar-based human pose estimation.
arXiv Detail & Related papers (2022-10-22T22:28:40Z) - Cross Vision-RF Gait Re-identification with Low-cost RGB-D Cameras and
mmWave Radars [15.662787088335618]
This work studies the problem of cross-modal human re-identification (ReID)
We propose the first-of-its-kind vision-RF system for cross-modal multi-person ReID at the same time.
Our proposed system is able to achieve 92.5% top-1 accuracy and 97.5% top-5 accuracy out of 56 volunteers.
arXiv Detail & Related papers (2022-07-16T10:34:25Z) - Multi-View Radar Semantic Segmentation [3.2093811507874768]
Automotive radars are low-cost active sensors that measure properties of surrounding objects.
They are seldom used for scene understanding due to the size and complexity of radar raw data.
We propose several novel architectures, and their associated losses, which analyse multiple "views" of the range-angle-Doppler radar tensor to segment it semantically.
arXiv Detail & Related papers (2021-03-30T09:56:41Z) - Depth Estimation from Monocular Images and Sparse Radar Data [93.70524512061318]
In this paper, we explore the possibility of achieving a more accurate depth estimation by fusing monocular images and Radar points using a deep neural network.
We find that the noise existing in Radar measurements is one of the main key reasons that prevents one from applying the existing fusion methods.
The experiments are conducted on the nuScenes dataset, which is one of the first datasets which features Camera, Radar, and LiDAR recordings in diverse scenes and weather conditions.
arXiv Detail & Related papers (2020-09-30T19:01:33Z) - RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars.
We propose a new solution that exploits both LiDAR and Radar sensors for perception.
Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.