Ponymation: Learning 3D Animal Motions from Unlabeled Online Videos
- URL: http://arxiv.org/abs/2312.13604v1
- Date: Thu, 21 Dec 2023 06:44:18 GMT
- Title: Ponymation: Learning 3D Animal Motions from Unlabeled Online Videos
- Authors: Keqiang Sun, Dor Litvak, Yunzhi Zhang, Hongsheng Li, Jiajun Wu,
Shangzhe Wu
- Abstract summary: We introduce a new method for learning a generative model articulated 3D animal motions from raw unlabeled online videos.
Our model does not require any pose annotations or shape models for training, and is learned purely from a collection of raw video clips obtained from the Internet.
- Score: 50.83155160955368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Ponymation, a new method for learning a generative model of
articulated 3D animal motions from raw, unlabeled online videos. Unlike
existing approaches for motion synthesis, our model does not require any pose
annotations or parametric shape models for training, and is learned purely from
a collection of raw video clips obtained from the Internet. We build upon a
recent work, MagicPony, which learns articulated 3D animal shapes purely from
single image collections, and extend it on two fronts. First, instead of
training on static images, we augment the framework with a video training
pipeline that incorporates temporal regularizations, achieving more accurate
and temporally consistent reconstructions. Second, we learn a generative model
of the underlying articulated 3D motion sequences via a spatio-temporal
transformer VAE, simply using 2D reconstruction losses without relying on any
explicit pose annotations. At inference time, given a single 2D image of a new
animal instance, our model reconstructs an articulated, textured 3D mesh, and
generates plausible 3D animations by sampling from the learned motion latent
space.
Related papers
- Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters.
Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z) - Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos [26.65191922949358]
We present a method to build animatable dog avatars from monocular videos.
This is challenging as animals display a range of (unpredictable) non-rigid movements and have a variety of appearance details.
We develop an approach that links the video frames via a 4D solution that jointly solves for animal's pose variation, and its appearance.
arXiv Detail & Related papers (2024-03-25T18:41:43Z) - Learning 3D Photography Videos via Self-supervised Diffusion on Single
Images [105.81348348510551]
3D photography renders a static image into a video with appealing 3D visual effects.
Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints.
We present a novel task: out-animation, which extends the space and time of input objects.
arXiv Detail & Related papers (2023-02-21T16:18:40Z) - Self-Supervised 3D Human Pose Estimation in Static Video Via Neural
Rendering [5.568218439349004]
Inferring 3D human pose from 2D images is a challenging and long-standing problem in the field of computer vision.
We present preliminary results for a method to estimate 3D pose from 2D video containing a single person.
arXiv Detail & Related papers (2022-10-10T09:24:07Z) - DOVE: Learning Deformable 3D Objects by Watching Videos [89.43105063468077]
We present DOVE, which learns to predict 3D canonical shape, deformation, viewpoint and texture from a single 2D image of a bird.
Our method reconstructs temporally consistent 3D shape and deformation, which allows us to animate and re-render the bird from arbitrary viewpoints.
arXiv Detail & Related papers (2021-07-22T17:58:10Z) - LASR: Learning Articulated Shape Reconstruction from a Monocular Video [97.92849567637819]
We introduce a template-free approach to learn 3D shapes from a single video.
Our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes.
arXiv Detail & Related papers (2021-05-06T21:41:11Z) - Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the
Wild [22.881898195409885]
Given an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video.
The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction.
arXiv Detail & Related papers (2020-12-23T18:50:42Z) - Online Adaptation for Consistent Mesh Reconstruction in the Wild [147.22708151409765]
We pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test video.
We demonstrate that our algorithm recovers temporally consistent and reliable 3D structures from videos of non-rigid objects including those of animals captured in the wild.
arXiv Detail & Related papers (2020-12-06T07:22:27Z) - Going beyond Free Viewpoint: Creating Animatable Volumetric Video of
Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances.
Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data.
For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.