Ponymation: Learning 3D Animal Motions from Unlabeled Online Videos
- URL: http://arxiv.org/abs/2312.13604v1
- Date: Thu, 21 Dec 2023 06:44:18 GMT
- Title: Ponymation: Learning 3D Animal Motions from Unlabeled Online Videos
- Authors: Keqiang Sun, Dor Litvak, Yunzhi Zhang, Hongsheng Li, Jiajun Wu,
Shangzhe Wu
- Abstract summary: We introduce a new method for learning a generative model articulated 3D animal motions from raw unlabeled online videos.
Our model does not require any pose annotations or shape models for training, and is learned purely from a collection of raw video clips obtained from the Internet.
- Score: 50.83155160955368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Ponymation, a new method for learning a generative model of
articulated 3D animal motions from raw, unlabeled online videos. Unlike
existing approaches for motion synthesis, our model does not require any pose
annotations or parametric shape models for training, and is learned purely from
a collection of raw video clips obtained from the Internet. We build upon a
recent work, MagicPony, which learns articulated 3D animal shapes purely from
single image collections, and extend it on two fronts. First, instead of
training on static images, we augment the framework with a video training
pipeline that incorporates temporal regularizations, achieving more accurate
and temporally consistent reconstructions. Second, we learn a generative model
of the underlying articulated 3D motion sequences via a spatio-temporal
transformer VAE, simply using 2D reconstruction losses without relying on any
explicit pose annotations. At inference time, given a single 2D image of a new
animal instance, our model reconstructs an articulated, textured 3D mesh, and
generates plausible 3D animations by sampling from the learned motion latent
space.
Related papers
- Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - Virtual Pets: Animatable Animal Generation in 3D Scenes [84.0990909455833]
We introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.
We leverage monocular internet videos and extract deformable NeRF representations for the foreground and static NeRF representations for the background.
We develop a reconstruction strategy, encompassing species-level shared template learning and per-video fine-tuning.
arXiv Detail & Related papers (2023-12-21T18:59:30Z) - AG3D: Learning to Generate 3D Avatars from 2D Image Collections [96.28021214088746]
We propose a new adversarial generative model of realistic 3D people from 2D images.
Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator.
We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance.
arXiv Detail & Related papers (2023-05-03T17:56:24Z) - Self-Supervised 3D Human Pose Estimation in Static Video Via Neural
Rendering [5.568218439349004]
Inferring 3D human pose from 2D images is a challenging and long-standing problem in the field of computer vision.
We present preliminary results for a method to estimate 3D pose from 2D video containing a single person.
arXiv Detail & Related papers (2022-10-10T09:24:07Z) - DOVE: Learning Deformable 3D Objects by Watching Videos [89.43105063468077]
We present DOVE, which learns to predict 3D canonical shape, deformation, viewpoint and texture from a single 2D image of a bird.
Our method reconstructs temporally consistent 3D shape and deformation, which allows us to animate and re-render the bird from arbitrary viewpoints.
arXiv Detail & Related papers (2021-07-22T17:58:10Z) - Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the
Wild [22.881898195409885]
Given an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video.
The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction.
arXiv Detail & Related papers (2020-12-23T18:50:42Z) - Online Adaptation for Consistent Mesh Reconstruction in the Wild [147.22708151409765]
We pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test video.
We demonstrate that our algorithm recovers temporally consistent and reliable 3D structures from videos of non-rigid objects including those of animals captured in the wild.
arXiv Detail & Related papers (2020-12-06T07:22:27Z) - Unsupervised object-centric video generation and decomposition in 3D [36.08064849807464]
We propose to model a video as the view seen while moving through a scene with multiple 3D objects and a 3D background.
Our model is trained from monocular videos without any supervision, yet learns to generate coherent 3D scenes containing several moving objects.
arXiv Detail & Related papers (2020-07-07T18:01:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.