Virtual Pets: Animatable Animal Generation in 3D Scenes
- URL: http://arxiv.org/abs/2312.14154v1
- Date: Thu, 21 Dec 2023 18:59:30 GMT
- Title: Virtual Pets: Animatable Animal Generation in 3D Scenes
- Authors: Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey
Tulyakov, Alexander Schwing, Liangyan Gui, Hsin-Ying Lee
- Abstract summary: We introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.
We leverage monocular internet videos and extract deformable NeRF representations for the foreground and static NeRF representations for the background.
We develop a reconstruction strategy, encompassing species-level shared template learning and per-video fine-tuning.
- Score: 84.0990909455833
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Toward unlocking the potential of generative models in immersive 4D
experiences, we introduce Virtual Pet, a novel pipeline to model realistic and
diverse motions for target animal species within a 3D environment. To
circumvent the limited availability of 3D motion data aligned with
environmental geometry, we leverage monocular internet videos and extract
deformable NeRF representations for the foreground and static NeRF
representations for the background. For this, we develop a reconstruction
strategy, encompassing species-level shared template learning and per-video
fine-tuning. Utilizing the reconstructed data, we then train a conditional 3D
motion model to learn the trajectory and articulation of foreground animals in
the context of 3D backgrounds. We showcase the efficacy of our pipeline with
comprehensive qualitative and quantitative evaluations using cat videos. We
also demonstrate versatility across unseen cats and indoor environments,
producing temporally coherent 4D outputs for enriched virtual experiences.
Related papers
- Animate3D: Animating Any 3D Model with Multi-view Video Diffusion [47.05131487114018]
Animate3D is a novel framework for animating any static 3D model.
We introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects.
arXiv Detail & Related papers (2024-07-16T05:35:57Z) - 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models [53.89348957053395]
We introduce a novel pipeline designed for text-to-4D scene generation.
Our method begins by generating a reference video using the video generation model.
We then learn the canonical 3D representation of the video using a freeze-time video.
arXiv Detail & Related papers (2024-06-11T17:19:26Z) - TC4D: Trajectory-Conditioned Text-to-4D Generation [94.90700997568158]
We propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components.
We learn local deformations that conform to the global trajectory using supervision from a text-to-video model.
Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion.
arXiv Detail & Related papers (2024-03-26T17:55:11Z) - Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - Reconstructing Animatable Categories from Videos [65.14948977749269]
Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging.
We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time.
We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.
arXiv Detail & Related papers (2023-05-10T17:56:21Z) - Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable
Categories [80.30216777363057]
We introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets.
At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views.
Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.
arXiv Detail & Related papers (2022-11-07T22:42:42Z) - ZooBuilder: 2D and 3D Pose Estimation for Quadrupeds Using Synthetic
Data [2.3661942553209236]
We train 2D and 3D pose estimation models with synthetic data, and put in place an end-to-end pipeline called ZooBuilder.
The pipeline takes as input a video of an animal in the wild, and generates the corresponding 2D and 3D coordinates for each joint of the animal's skeleton.
arXiv Detail & Related papers (2020-09-01T07:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.