RAPTR: Radar-based 3D Pose Estimation using Transformer
- URL: http://arxiv.org/abs/2511.08387v1
- Date: Wed, 12 Nov 2025 01:56:55 GMT
- Title: RAPTR: Radar-based 3D Pose Estimation using Transformer
- Authors: Sorachi Kato, Ryoma Yataka, Pu Perry Wang, Pedro Miraldo, Takuya Fujihashi, Petros Boufounos,
- Abstract summary: Radar-based indoor 3D human pose estimation typically relied on fine-grained 3D keypoint labels.<n>We propose textbfRAPTR (RAdar Pose esTimation using tRansformer) under weak supervision, using only 3D BBox and 2D keypoint labels.
- Score: 24.646708425495472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Radar-based indoor 3D human pose estimation typically relied on fine-grained 3D keypoint labels, which are costly to obtain especially in complex indoor settings involving clutter, occlusions, or multiple people. In this paper, we propose \textbf{RAPTR} (RAdar Pose esTimation using tRansformer) under weak supervision, using only 3D BBox and 2D keypoint labels which are considerably easier and more scalable to collect. Our RAPTR is characterized by a two-stage pose decoder architecture with a pseudo-3D deformable attention to enhance (pose/joint) queries with multi-view radar features: a pose decoder estimates initial 3D poses with a 3D template loss designed to utilize the 3D BBox labels and mitigate depth ambiguities; and a joint decoder refines the initial poses with 2D keypoint labels and a 3D gravity loss. Evaluated on two indoor radar datasets, RAPTR outperforms existing methods, reducing joint position error by $34.3\%$ on HIBER and $76.9\%$ on MMVR. Our implementation is available at https://github.com/merlresearch/radar-pose-transformer.
Related papers
- PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space [62.10630827126755]
PandaPose is a 3D human pose lifting approach via propagating 2D pose prior to 3D anchor space as the unified intermediate representation.<n>Our 3D anchor space comprises: (1) Joint-wise 3D anchors in the canonical coordinate system, providing accurate and robust priors to mitigate 2D pose estimation inaccuracies.
arXiv Detail & Related papers (2026-02-01T08:20:40Z) - VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering [18.77072205559739]
VSRD++ is a novel weakly supervised framework for monocular 3D object detection.<n>It eliminates the reliance on 3D annotations and leverages neural-field-based volumetric rendering.<n>In the monocular 3D object detection phase, the optimized 3D bounding boxes serve as pseudo labels.
arXiv Detail & Related papers (2025-12-01T01:28:35Z) - PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection [13.60524473223155]
This paper introduces PointAD, a novel approach that transfers the strong generalization capabilities of CLIP for recognizing 3D anomalies on unseen objects.<n>PointAD renders 3D anomalies into multiple 2D renderings and projects them back into 3D space.<n>Our model can directly integrate RGB information, further enhancing the understanding of 3D anomalies in a plug-and-play manner.
arXiv Detail & Related papers (2024-10-01T01:40:22Z) - VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection [11.061100776969383]
Monocular 3D object detection poses a significant challenge in 3D scene understanding.
Existing methods heavily rely on supervised learning using abundant 3D labels.
We propose a novel weakly supervised 3D object detection framework named VSRD.
arXiv Detail & Related papers (2024-03-29T20:43:55Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single
Object Tracking [12.644452175343059]
A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates.
Instead of relying on 3D proposals, we produce 2D region proposals which are then extruded into 3D viewing frustums.
We perform an online accuracy validation on the 3D frustum to generate refined point cloud searching space.
arXiv Detail & Related papers (2020-10-22T08:01:17Z) - RTM3D: Real-time Monocular 3D Detection from Object Keypoints for
Autonomous Driving [26.216609821525676]
Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component.
Our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space.
Our method is the first real-time system for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2020-01-10T08:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.