Better Together: Unified Motion Capture and 3D Avatar Reconstruction
- URL: http://arxiv.org/abs/2503.09293v1
- Date: Wed, 12 Mar 2025 11:39:43 GMT
- Title: Better Together: Unified Motion Capture and 3D Avatar Reconstruction
- Authors: Arthur Moreau, Mohammed Brahimi, Richard Shaw, Athanasios Papaioannou, Thomas Tanay, Zhensong Zhang, Eduardo Pérez-Pellitero,
- Abstract summary: We present a method that simultaneously solves the human pose estimation problem while reconstructing a 3D human avatar from multi-view videos.<n>We introduce a novel animatable avatar with 3D Gaussians rigged on a personalized mesh.<n>We first evaluate our method on highly challenging yoga poses and demonstrate state-of-the-art accuracy on multi-view human pose estimation.
- Score: 6.329917162442801
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Better Together, a method that simultaneously solves the human pose estimation problem while reconstructing a photorealistic 3D human avatar from multi-view videos. While prior art usually solves these problems separately, we argue that joint optimization of skeletal motion with a 3D renderable body model brings synergistic effects, i.e. yields more precise motion capture and improved visual quality of real-time rendering of avatars. To achieve this, we introduce a novel animatable avatar with 3D Gaussians rigged on a personalized mesh and propose to optimize the motion sequence with time-dependent MLPs that provide accurate and temporally consistent pose estimates. We first evaluate our method on highly challenging yoga poses and demonstrate state-of-the-art accuracy on multi-view human pose estimation, reducing error by 35% on body joints and 45% on hand joints compared to keypoint-based methods. At the same time, our method significantly boosts the visual quality of animatable avatars (+2dB PSNR on novel view synthesis) on diverse challenging subjects.
Related papers
- FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images [74.86864398919467]
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images.
We learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization.
Our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos.
arXiv Detail & Related papers (2025-03-24T23:20:47Z) - Multimodal Generation of Animatable 3D Human Models with AvatarForge [67.31920821192323]
AvatarForge is a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation.
Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation.
arXiv Detail & Related papers (2025-03-11T08:29:18Z) - Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos [64.10307207290039]
We introduce a novel framework for modeling high-fidelity, animatable 3D human avatars from motion-blurred monocular video inputs.<n>By explicitly modeling human motion trajectories during exposure time, we jointly optimize the trajectories and 3D Gaussians to reconstruct sharp, high-quality human avatars.
arXiv Detail & Related papers (2025-01-23T02:31:57Z) - AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction [26.82525451095629]
We propose a robust method for 3D reconstruction of inconsistent images, enabling real-time rendering during inference.<n>We recast the reconstruction problem as a 4D task and introduce an efficient 3D modeling approach using 4D Gaussian Splatting.<n>Experiments demonstrate that our method achieves photorealistic, real-time animation of 3D human avatars from in-the-wild images.
arXiv Detail & Related papers (2024-12-03T18:55:39Z) - Bundle Adjusted Gaussian Avatars Deblurring [31.718130377229482]
We propose a 3D-aware, physics-oriented model of blur formation attributable to human movement and a 3D human motion model to clarify ambiguities found in motion-induced blurry images.
We have established benchmarks for this task through a synthetic dataset derived from existing multi-view captures, alongside a real-captured dataset acquired through a 360-degree synchronous hybrid-exposure camera system.
arXiv Detail & Related papers (2024-11-24T10:03:24Z) - AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos [31.904839609743448]
Existing multi-view methods often face challenges in estimating the 3D pose and shape of multiple closely interacting people.
We propose a novel method leveraging the personalized implicit neural avatar of each individual as a prior.
Our experimental results demonstrate state-of-the-art performance on several public datasets.
arXiv Detail & Related papers (2024-08-04T18:41:35Z) - GVA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos [56.40776739573832]
We present a novel method that facilitates the creation of vivid 3D Gaussian avatars from monocular video inputs (GVA)
Our innovation lies in addressing the intricate challenges of delivering high-fidelity human body reconstructions.
We introduce a pose refinement technique to improve hand and foot pose accuracy by aligning normal maps and silhouettes.
arXiv Detail & Related papers (2024-02-26T14:40:15Z) - GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians [51.46168990249278]
We present an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video.
GustafAvatar is validated on both the public dataset and our collected dataset.
arXiv Detail & Related papers (2023-12-04T18:55:45Z) - Human from Blur: Human Pose Tracking from Blurry Images [89.65036443997103]
We propose a method to estimate 3D human poses from substantially blurred images.
Key idea is to tackle the inverse problem of image deblurring by modeling the forward problem with a 3D human model, a texture map, and a sequence of poses to describe human motion.
Using a differentiable step, we can solve the inverse problem by backpropagating the pixel-wise reprojection error to recover the best human motion representation.
arXiv Detail & Related papers (2023-03-30T08:05:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.