SAM 3D Body: Robust Full-Body Human Mesh Recovery
- URL: http://arxiv.org/abs/2602.15989v1
- Date: Tue, 17 Feb 2026 20:26:37 GMT
- Title: SAM 3D Body: Robust Full-Body Human Mesh Recovery
- Authors: Xitong Yang, Devansh Kukreja, Don Pinkus, Anushka Sagar, Taosha Fan, Jinhyung Park, Soyong Shin, Jinkun Cao, Jiawei Liu, Nicolas Ugrinovic, Matt Feiszli, Jitendra Malik, Piotr Dollar, Kris Kitani,
- Abstract summary: We introduce SAM 3D Body (3DB), a promptable model for single-image full-body 3D human mesh recovery (HMR)<n>3DB estimates the human pose of the body, feet, and hands.<n>It is the first model to use a new parametric mesh representation, Momentum Human Rig (MHR), which decouples skeletal structure and surface shape.
- Score: 65.0108906331903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SAM 3D Body (3DB), a promptable model for single-image full-body 3D human mesh recovery (HMR) that demonstrates state-of-the-art performance, with strong generalization and consistent accuracy in diverse in-the-wild conditions. 3DB estimates the human pose of the body, feet, and hands. It is the first model to use a new parametric mesh representation, Momentum Human Rig (MHR), which decouples skeletal structure and surface shape. 3DB employs an encoder-decoder architecture and supports auxiliary prompts, including 2D keypoints and masks, enabling user-guided inference similar to the SAM family of models. We derive high-quality annotations from a multi-stage annotation pipeline that uses various combinations of manual keypoint annotation, differentiable optimization, multi-view geometry, and dense keypoint detection. Our data engine efficiently selects and processes data to ensure data diversity, collecting unusual poses and rare imaging conditions. We present a new evaluation dataset organized by pose and appearance categories, enabling nuanced analysis of model behavior. Our experiments demonstrate superior generalization and substantial improvements over prior methods in both qualitative user preference studies and traditional quantitative analysis. Both 3DB and MHR are open-source.
Related papers
- Point2Pose: A Generative Framework for 3D Human Pose Estimation with Multi-View Point Cloud Dataset [6.181093777643576]
3D human pose estimation poses several challenges due to the complex geometry of the human body and self-cluding joints.<n>We introduce a framework that effectively conditioned the distribution of human poses and pose history.<n>We present a large-scale indoor dataset MVPose3D, which contains multiple modalities.
arXiv Detail & Related papers (2025-12-11T06:11:24Z) - DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior [82.9526308672547]
We present DPoser-X, a diffusion-based prior model for 3D whole-body human poses.<n>Our approach unifies various pose-centric tasks as inverse problems, solving them through variational diffusion sampling.<n>Our model consistently outperforms state-of-the-art alternatives, establishing a new benchmark for whole-body human pose prior modeling.
arXiv Detail & Related papers (2025-08-01T12:56:39Z) - Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation [32.30055363306321]
We propose a paradigm for seamlessly unifying different human pose and shape-related tasks and datasets.<n>Our formulation is centered on the ability to query any arbitrary point of the human volume, and obtain its estimated location in 3D.
arXiv Detail & Related papers (2024-07-10T10:44:18Z) - ASM: Adaptive Skinning Model for High-Quality 3D Face Modeling [11.885382595302751]
We argue that reconstruction with multi-view uncalibrated images demands a new model with stronger capacity.
We propose Adaptive Skinning Model (ASM), which redefines the skinning model with more compact and fully tunable parameters.
Our work opens up new research direction for parametric face model and facilitates future research on multi-view reconstruction.
arXiv Detail & Related papers (2023-04-19T04:55:28Z) - Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation [1.1501261942096426]
We introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation.
Our model is able to capture the long-range dependencies between body joints.
Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2021-11-01T13:48:55Z) - THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [67.8628917474705]
THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people.
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models.
We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
arXiv Detail & Related papers (2021-06-17T09:09:24Z) - Monocular Real-time Full Body Capture with Inter-part Correlations [66.22835689189237]
We present the first method for real-time full body capture that estimates shape and motion of body and hands together with a dynamic 3D face model from a single color image.
Our approach uses a new neural network architecture that exploits correlations between body and hands at high computational efficiency.
arXiv Detail & Related papers (2020-12-11T02:37:56Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.