M4Human: A Large-Scale Multimodal mmWave Radar Benchmark for Human Mesh Reconstruction
- URL: http://arxiv.org/abs/2512.12378v2
- Date: Wed, 17 Dec 2025 10:00:38 GMT
- Title: M4Human: A Large-Scale Multimodal mmWave Radar Benchmark for Human Mesh Reconstruction
- Authors: Junqiao Fan, Yunjiao Zhou, Yizhuo Yang, Xinyuan Cui, Jiarui Zhang, Lihua Xie, Jianfei Yang, Chris Xiaoxuan Lu, Fangqiang Ding,
- Abstract summary: M4Human is the current largest-scale (661K-frame) multimodal benchmark, featuring high-resolution mmWave radar, RGB, and depth data.<n>M4Human includes high-quality motion capture (MoCap) annotations with 3D meshes and global trajectories.<n>We establish benchmarks on both RT and RPC modalities, as well as multimodal fusion with RGB-D modalities.
- Score: 45.65814462778685
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Human mesh reconstruction (HMR) provides direct insights into body-environment interaction, which enables various immersive applications. While existing large-scale HMR datasets rely heavily on line-of-sight RGB input, vision-based sensing is limited by occlusion, lighting variation, and privacy concerns. To overcome these limitations, recent efforts have explored radio-frequency (RF) mmWave radar for privacy-preserving indoor human sensing. However, current radar datasets are constrained by sparse skeleton labels, limited scale, and simple in-place actions. To advance the HMR research community, we introduce M4Human, the current largest-scale (661K-frame) ($9\times$ prior largest) multimodal benchmark, featuring high-resolution mmWave radar, RGB, and depth data. M4Human provides both raw radar tensors (RT) and processed radar point clouds (RPC) to enable research across different levels of RF signal granularity. M4Human includes high-quality motion capture (MoCap) annotations with 3D meshes and global trajectories, and spans 20 subjects and 50 diverse actions, including in-place, sit-in-place, and free-space sports or rehabilitation movements. We establish benchmarks on both RT and RPC modalities, as well as multimodal fusion with RGB-D modalities. Extensive results highlight the significance of M4Human for radar-based human modeling while revealing persistent challenges under fast, unconstrained motion. The dataset and code will be released after the paper publication.
Related papers
- Real-Time 4D Radar Perception for Robust Human Detection in Harsh Enclosed Environments [10.956927578406388]
This paper introduces a novel methodology for generating controlled, multi-level dust concentrations in a cluttered environment representative of harsh, enclosed environments.<n>We also present a new 4D mmWave radar dataset, augmented by camera and LiDAR, illustrating how dust particles and reflective surfaces jointly impact the sensing functionality.
arXiv Detail & Related papers (2026-01-19T20:01:15Z) - Wavelet-based Multi-View Fusion of 4D Radar Tensor and Camera for Robust 3D Object Detection [44.78575994732947]
WRCFormer is a novel 3D object detection framework that fuses raw radar cubes with camera inputs via multi-view representations of the decoupled radar cube.<n>WRCFormer achieves state-of-the-art performance on the K-Radar benchmarks, surpassing the best model by approximately 2.4% in all scenarios.
arXiv Detail & Related papers (2025-12-28T15:32:17Z) - RadarGen: Automotive Radar Point Cloud Generation from Cameras [64.69976771710057]
We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery.<n>RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird's-eye-view form.<n>We show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data.
arXiv Detail & Related papers (2025-12-19T18:57:33Z) - mmPred: Radar-based Human Motion Prediction in the Dark [43.00006337997152]
Existing Human Motion Prediction methods based on RGB-D cameras are sensitive to lighting conditions and raise privacy concerns.<n>This work introduces radar as a novel sensing modality for HMP, for the first time.<n>We propose mmPred, the first diffusion-based framework tailored for radar-based HMP.
arXiv Detail & Related papers (2025-11-29T06:26:55Z) - RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence [10.115852646162843]
We present Radar-LLM, the first framework that leverages large language models (LLMs) for human understanding using millimeter-wave radar as the sensing modality.<n>To address data scarcity, we introduce a physics-aware pipeline synthesis that generates realistic radar-text pairs from motion-text datasets.<n>Radar-LLM achieves state-of-the-art performance across both synthetic and real-world benchmarks, enabling accurate translation of millimeter-wave signals to natural language descriptions.
arXiv Detail & Related papers (2025-04-14T04:18:25Z) - HeRCULES: Heterogeneous Radar Dataset in Complex Urban Environment for Multi-session Radar SLAM [9.462058316827804]
HeRCULES dataset is a comprehensive, multi-modal dataset with heterogeneous radars, FMCW LiDAR, IMU, GPS, and cameras.<n>This is the first dataset to integrate 4D radar and spinning radar alongside FMCW LiDAR, offering unparalleled localization, mapping, and place recognition capabilities.
arXiv Detail & Related papers (2025-02-04T02:41:00Z) - V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection [64.93675471780209]
We present V2X-R, the first simulated V2X dataset incorporating LiDAR, camera, and 4D radar.<n>V2X-R contains 12,079 scenarios with 37,727 frames of LiDAR and 4D radar point clouds, 150,908 images, and 170,859 annotated 3D vehicle bounding boxes.<n>We propose a novel cooperative LiDAR-4D radar fusion pipeline for 3D object detection and implement it with various fusion strategies.
arXiv Detail & Related papers (2024-11-13T07:41:47Z) - SUPER: Seated Upper Body Pose Estimation using mmWave Radars [6.205521056622584]
Super is a framework for seated upper body human pose estimation that utilizes dual-mmWave radars in close proximity.
A lightweight neural network extracts both global and local features of upper body and output pose parameters for the Skinned Multi-Person Linear (SMPL) model.
arXiv Detail & Related papers (2024-07-02T17:32:34Z) - Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers.
Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements.
We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z) - Diffusion Models for Interferometric Satellite Aperture Radar [73.01013149014865]
Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models.
Here, we leverage PDMs to generate several radar-based satellite image datasets.
We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue.
arXiv Detail & Related papers (2023-08-31T16:26:17Z) - RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars.
We propose a new solution that exploits both LiDAR and Radar sensors for perception.
Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.