Related papers: AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos

AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos

URL: http://arxiv.org/abs/2408.02110v2
Date: Tue, 20 Aug 2024 07:19:58 GMT
Title: AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos
Authors: Feichi Lu, Zijian Dong, Jie Song, Otmar Hilliges,
Abstract summary: Existing multi-view methods often face challenges in estimating the 3D pose and shape of multiple closely interacting people. We propose a novel method leveraging the personalized implicit neural avatar of each individual as a prior. Our experimental results demonstrate state-of-the-art performance on several public datasets.
Score: 31.904839609743448
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite progress in human motion capture, existing multi-view methods often face challenges in estimating the 3D pose and shape of multiple closely interacting people. This difficulty arises from reliance on accurate 2D joint estimations, which are hard to obtain due to occlusions and body contact when people are in close interaction. To address this, we propose a novel method leveraging the personalized implicit neural avatar of each individual as a prior, which significantly improves the robustness and precision of this challenging pose estimation task. Concretely, the avatars are efficiently reconstructed via layered volume rendering from sparse multi-view videos. The reconstructed avatar prior allows for the direct optimization of 3D poses based on color and silhouette rendering loss, bypassing the issues associated with noisy 2D detections. To handle interpenetration, we propose a collision loss on the overlapping shape regions of avatars to add penetration constraints. Moreover, both 3D poses and avatars are optimized in an alternating manner. Our experimental results demonstrate state-of-the-art performance on several public datasets.

Related papers

PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images [23.745241278910946]
PF-LHM is a large human reconstruction model that generates high-quality 3D avatars in seconds from one or multiple casually captured pose-free images.<n>Our method unifies single- and multi-image 3D human reconstruction, achieving high-fidelity and animatable 3D human avatars without requiring camera and human pose annotations.
arXiv Detail & Related papers (2025-06-16T17:59:56Z)
UMA: Ultra-detailed Human Avatars via Multi-level Surface Alignment [55.0783220713185]
Learning an animatable and clothed human avatar model with vivid dynamics and photorealistic appearance from multi-view videos is an important foundational research problem in computer graphics and vision.<n>We propose a latent deformation model and supervising the 3D deformation of the animatable character using guidance from foundational 2D video point trackers.<n>Our approach demonstrates significantly improved performance in rendering quality and geometric accuracy over the prior state of the art.
arXiv Detail & Related papers (2025-06-02T15:42:33Z)
FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images [74.86864398919467]
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. We learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos.
arXiv Detail & Related papers (2025-03-24T23:20:47Z)
Better Together: Unified Motion Capture and 3D Avatar Reconstruction [6.329917162442801]
We present a method that simultaneously solves the human pose estimation problem while reconstructing a 3D human avatar from multi-view videos. We introduce a novel animatable avatar with 3D Gaussians rigged on a personalized mesh. We first evaluate our method on highly challenging yoga poses and demonstrate state-of-the-art accuracy on multi-view human pose estimation.
arXiv Detail & Related papers (2025-03-12T11:39:43Z)
DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion [57.83515140886807]
We introduce the task of Deficiency-Aware 3D Pose Estimation. DeProPose is a flexible method that simplifies the network architecture to reduce training complexity. We have developed a novel 3D human pose estimation dataset.
arXiv Detail & Related papers (2025-02-23T03:22:54Z)
L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild [15.174438063000453]
3D pose estimation provides a more comprehensive solution by incorporating depth, yet creating 3D pose datasets for animals is challenging due to their dynamic and unpredictable behaviours in natural settings. We propose a framework with systematically synthesized datasets for lifting poses from 2D to 3D and then utilize this to re-target motion from wild settings onto arbitrary avatars.
arXiv Detail & Related papers (2025-01-02T10:04:12Z)
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars. We leverage the SMPL model to provide shape and pose guidance for the generation. We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z)
Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings. We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z)
DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection. We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector. We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z)
RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map. We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z)
Dual networks based 3D Multi-Person Pose Estimation from Monocular Video [42.01876518017639]
Multi-person 3D pose estimation is more challenging than single pose estimation. Existing top-down and bottom-up approaches to pose estimation suffer from detection errors. We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2022-05-02T08:53:38Z)
Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation [46.38290735670527]
Recovering multi-person 3D poses from a single RGB image is a severely ill-conditioned problem. Recent works have shown promising results by simultaneously reasoning for different people but in all cases within a local neighborhood. PI-Net introduces a self-attention block to reason for all people in the image at the same time and refine potentially noisy initial 3D poses. In this paper, we model people interactions at a whole, independently of their number, and in a permutation-invariant manner building upon the Set Transformer.
arXiv Detail & Related papers (2022-04-11T07:23:54Z)
PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only. PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose. We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z)
Multi-person Implicit Reconstruction from a Single Image [37.6877421030774]
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image. Existing multi-person methods suffer from two main drawbacks: they are often model-based and cannot capture accurate 3D models of people with loose clothing and hair.
arXiv Detail & Related papers (2021-04-19T13:21:55Z)
Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage Optimization [33.02708860641971]
Estimating 3D human poses from a monocular video is still a challenging task. Many existing methods drop when the target person is cluded by other objects, or the motion is too fast/slow relative to the scale and speed of the training data. We introduce atemporal-temporal network for robust 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T15:24:28Z)
Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z)
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.