SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos
- URL: http://arxiv.org/abs/2512.08406v1
- Date: Tue, 09 Dec 2025 09:37:31 GMT
- Title: SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos
- Authors: Mingqi Gao, Yunqi Miao, Jungong Han,
- Abstract summary: Human Mesh Recovery aims to reconstruct 3D human pose and shape from 2D observations.<n>Recent image-based HMR methods such as SAM 3D Body achieve strong robustness on in-the-wild images.<n>We propose SAM-Body4D, a training-free framework for temporally consistent and occlusion-robust HMR from videos.
- Score: 53.227781131348856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human Mesh Recovery (HMR) aims to reconstruct 3D human pose and shape from 2D observations and is fundamental to human-centric understanding in real-world scenarios. While recent image-based HMR methods such as SAM 3D Body achieve strong robustness on in-the-wild images, they rely on per-frame inference when applied to videos, leading to temporal inconsistency and degraded performance under occlusions. We address these issues without extra training by leveraging the inherent human continuity in videos. We propose SAM-Body4D, a training-free framework for temporally consistent and occlusion-robust HMR from videos. We first generate identity-consistent masklets using a promptable video segmentation model, then refine them with an Occlusion-Aware module to recover missing regions. The refined masklets guide SAM 3D Body to produce consistent full-body mesh trajectories, while a padding-based parallel strategy enables efficient multi-human inference. Experimental results demonstrate that SAM-Body4D achieves improved temporal stability and robustness in challenging in-the-wild videos, without any retraining. Our code and demo are available at: https://github.com/gaomingqi/sam-body4d.
Related papers
- Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video [81.44600627066747]
Mesh4D is a feed-forward model for monocular 4D mesh reconstruction.<n>Our key contribution is a compact latent space that encodes the entire animation sequence in a single pass.<n>We evaluate Mesh4D on reconstruction and view novel benchmarks, outperforming prior methods in recovering accurate 3D shape and deformation.
arXiv Detail & Related papers (2026-01-08T18:59:56Z) - Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video [56.781766315691854]
We introduce textbfRestage4D, a geometry-preserving pipeline for video-conditioned 4D restaging.<n>We validate Restage4D on DAVIS and PointOdyssey, demonstrating improved geometry consistency, motion quality, and 3D tracking performance.
arXiv Detail & Related papers (2025-08-08T21:31:51Z) - Humans in 4D: Reconstructing and Tracking Humans with Transformers [72.50856500760352]
We present an approach to reconstruct humans and track them over time.
At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery.
This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images.
arXiv Detail & Related papers (2023-05-31T17:59:52Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D
Images/Videos [47.601288796052714]
We develop two novel Pose frameworks, i.e., Serial PC-HMR and Parallel PC-HMR.
Our frameworks are based on generic and complementary integration of data-driven learning and geometrical modeling.
We perform extensive experiments on the popular bench-marks, i.e., Human3.6M, 3DPW and SURREAL, where our PC-HMR frameworks achieve the SOTA results.
arXiv Detail & Related papers (2021-03-16T12:12:45Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.