Humans in 4D: Reconstructing and Tracking Humans with Transformers
- URL: http://arxiv.org/abs/2305.20091v3
- Date: Thu, 31 Aug 2023 16:45:40 GMT
- Title: Humans in 4D: Reconstructing and Tracking Humans with Transformers
- Authors: Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo
Kanazawa, Jitendra Malik
- Abstract summary: We present an approach to reconstruct humans and track them over time.
At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery.
This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images.
- Score: 72.50856500760352
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present an approach to reconstruct humans and track them over time. At the
core of our approach, we propose a fully "transformerized" version of a network
for human mesh recovery. This network, HMR 2.0, advances the state of the art
and shows the capability to analyze unusual poses that have in the past been
difficult to reconstruct from single images. To analyze video, we use 3D
reconstructions from HMR 2.0 as input to a tracking system that operates in 3D.
This enables us to deal with multiple people and maintain identities through
occlusion events. Our complete approach, 4DHumans, achieves state-of-the-art
results for tracking people from monocular video. Furthermore, we demonstrate
the effectiveness of HMR 2.0 on the downstream task of action recognition,
achieving significant improvements over previous pose-based action recognition
approaches. Our code and models are available on the project website:
https://shubham-goel.github.io/4dhumans/.
Related papers
- SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos [53.227781131348856]
Human Mesh Recovery aims to reconstruct 3D human pose and shape from 2D observations.<n>Recent image-based HMR methods such as SAM 3D Body achieve strong robustness on in-the-wild images.<n>We propose SAM-Body4D, a training-free framework for temporally consistent and occlusion-robust HMR from videos.
arXiv Detail & Related papers (2025-12-09T09:37:31Z) - Human3R: Everyone Everywhere All at Once [69.16576238974876]
We present Human3R, a feed-forward framework for online 4D human-scene reconstruction from monocular videos.<n>Human3R is a unified model that eliminates heavy dependencies and iterative refinement.<n>It delivers superior performance across tasks, including global human motion estimation, local human mesh recovery, video depth estimation, and camera pose estimation.
arXiv Detail & Related papers (2025-10-07T17:59:52Z) - HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction [15.368018463074058]
HAMSt3R is an extension of MASt3R for joint human and scene 3D reconstruction from sparse, uncalibrated images.<n>Our method incorporates additional network heads to segment people, estimate dense correspondences via DensePose, and predict depth in human-centric environments.
arXiv Detail & Related papers (2025-08-22T14:43:18Z) - UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting [54.883935964137706]
We introduce UAV4D, a framework for enabling photorealistic rendering for dynamic real-world scenes captured by UAVs.<n>We use a combination of a 3D foundation model and a human mesh reconstruction model to reconstruct both the scene background and humans.<n>Our results demonstrate the benefits of our approach over existing methods in novel view synthesis, achieving a 1.5 dB PSNR improvement and superior visual sharpness.
arXiv Detail & Related papers (2025-06-05T13:21:09Z) - ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos [18.73641648585445]
Recent neural rendering advances have enabled holistic human-scene reconstruction but require pre-calibrated camera and human poses.
We introduce a novel unified framework that simultaneously performs camera tracking, human pose estimation and human-scene reconstruction in an online fashion.
Specifically, we design a human deformation module to reconstruct the details and enhance generalizability to out-of-distribution poses faithfully.
arXiv Detail & Related papers (2025-04-17T17:59:02Z) - WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction [51.22641018932625]
We present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis.
Our method achieves SOTA performance in producing photorealistic renderings from the given monocular video.
arXiv Detail & Related papers (2025-02-03T04:43:41Z) - Motion Diffusion-Guided 3D Global HMR from a Dynamic Camera [3.6948631725065355]
We present DiffOpt, a novel 3D global HMR method using Diffusion Optimization.
Our key insight is that recent advances in human motion generation, such as the motion diffusion model (MDM), contain a strong prior of coherent human motion.
We validate DiffOpt with video sequences from the Electromagnetic Database of Global 3D Human Pose and Shape in the Wild.
arXiv Detail & Related papers (2024-11-15T21:09:40Z) - SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion [35.73448283467723]
SiTH is a novel pipeline that integrates an image-conditioned diffusion model into a 3D mesh reconstruction workflow.
We employ a powerful generative diffusion model to hallucinate unseen back-view appearance based on the input images.
For the latter, we leverage skinned body meshes as guidance to recover full-body texture meshes from the input and back-view images.
arXiv Detail & Related papers (2023-11-27T14:22:07Z) - TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D
Environments [106.80978555346958]
Current methods can't reliably estimate moving humans in global coordinates.
TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras.
It achieves state-of-the-art performance on tracking and HPS benchmarks.
arXiv Detail & Related papers (2023-06-05T13:00:44Z) - Decoupling Human and Camera Motion from Videos in the Wild [67.39432972193929]
We propose a method to reconstruct global human trajectories from videos in the wild.
Our method decouples the camera and human motion, which allows us to place people in the same world coordinate frame.
arXiv Detail & Related papers (2023-02-24T18:59:15Z) - Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using
Pixel-aligned Reconstruction Priors [56.192682114114724]
Get3DHuman is a novel 3D human framework that can significantly boost the realism and diversity of the generated outcomes.
Our key observation is that the 3D generator can profit from human-related priors learned through 2D human generators and 3D reconstructors.
arXiv Detail & Related papers (2023-02-02T15:37:46Z) - UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture [70.59984501516084]
UnrealEgo is a new large-scale naturalistic dataset for egocentric 3D human pose estimation.
It is based on an advanced concept of eyeglasses equipped with two fisheye cameras that can be used in unconstrained environments.
We propose a new benchmark method with a simple but effective idea of devising a 2D keypoint estimation module for stereo inputs to improve 3D human pose estimation.
arXiv Detail & Related papers (2022-08-02T17:59:54Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - On Development and Evaluation of Retargeting Human Motion and Appearance
in Monocular Videos [2.870762512009438]
Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision.
We propose a novel and high-performant approach based on a hybrid image-based rendering technique that exhibits competitive visual quality.
We also present a new video benchmark dataset composed of different videos with annotated human motions to evaluate the task of synthesizing people's videos.
arXiv Detail & Related papers (2021-03-29T13:17:41Z) - 4D Human Body Capture from Egocentric Video via 3D Scene Grounding [38.3169520384642]
We introduce a novel task of reconstructing a time series of second-person 3D human body meshes from monocular egocentric videos.
The unique viewpoint and rapid embodied camera motion of egocentric videos raise additional technical barriers for human body capture.
arXiv Detail & Related papers (2020-11-26T15:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.