Related papers: 4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos

4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos

URL: http://arxiv.org/abs/2507.10437v1
Date: Mon, 14 Jul 2025 16:24:31 GMT
Title: 4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos
Authors: Shanshan Zhong, Jiawei Peng, Zehan Zheng, Zhongzhan Huang, Wufei Ma, Guofeng Zhang, Qihao Liu, Alan Yuille, Jieneng Chen,
Abstract summary: We propose 4D-Animal, a novel framework that reconstructs animatable 3D animals from videos without requiring sparse keypoint annotations.<n>Our approach introduces a dense feature network that maps 2D representations to SMAL parameters, enhancing both the efficiency and stability of the fitting process.
Score: 15.063635374924209
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing methods for reconstructing animatable 3D animals from videos typically rely on sparse semantic keypoints to fit parametric models. However, obtaining such keypoints is labor-intensive, and keypoint detectors trained on limited animal data are often unreliable. To address this, we propose 4D-Animal, a novel framework that reconstructs animatable 3D animals from videos without requiring sparse keypoint annotations. Our approach introduces a dense feature network that maps 2D representations to SMAL parameters, enhancing both the efficiency and stability of the fitting process. Furthermore, we develop a hierarchical alignment strategy that integrates silhouette, part-level, pixel-level, and temporal cues from pre-trained 2D visual models to produce accurate and temporally coherent reconstructions across frames. Extensive experiments demonstrate that 4D-Animal outperforms both model-based and model-free baselines. Moreover, the high-quality 3D assets generated by our method can benefit other 3D tasks, underscoring its potential for large-scale applications. The code is released at https://github.com/zhongshsh/4D-Animal.

Related papers

ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images [47.682942867405224]
ConDense is a framework for 3D pre-training utilizing existing 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline.
arXiv Detail & Related papers (2024-08-30T05:57:01Z)
Virtual Pets: Animatable Animal Generation in 3D Scenes [84.0990909455833]
We introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment. We leverage monocular internet videos and extract deformable NeRF representations for the foreground and static NeRF representations for the background. We develop a reconstruction strategy, encompassing species-level shared template learning and per-video fine-tuning.
arXiv Detail & Related papers (2023-12-21T18:59:30Z)
Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos [47.97168047776216]
We introduce a new method for learning a generative model of articulated 3D animal motions from raw, unlabeled online videos. Our model learns purely from a collection of unlabeled web video clips, leveraging semantic correspondences distilled from self-supervised image features.
arXiv Detail & Related papers (2023-12-21T06:44:18Z)
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm [111.16358607889609]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.<n>For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z)
Reconstructing Animatable Categories from Videos [65.14948977749269]
Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging. We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time. We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.
arXiv Detail & Related papers (2023-05-10T17:56:21Z)
Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories [80.30216777363057]
We introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets. At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views. Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.
arXiv Detail & Related papers (2022-11-07T22:42:42Z)
Gait Recognition in the Wild with Dense 3D Representations and A Benchmark [86.68648536257588]
Existing studies for gait recognition are dominated by 2D representations like the silhouette or skeleton of the human body in constrained scenes. This paper aims to explore dense 3D representations for gait recognition in the wild. We build the first large-scale 3D representation-based gait recognition dataset, named Gait3D.
arXiv Detail & Related papers (2022-04-06T03:54:06Z)
DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only. Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species. We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z)
ZooBuilder: 2D and 3D Pose Estimation for Quadrupeds Using Synthetic Data [2.3661942553209236]
We train 2D and 3D pose estimation models with synthetic data, and put in place an end-to-end pipeline called ZooBuilder. The pipeline takes as input a video of an animal in the wild, and generates the corresponding 2D and 3D coordinates for each joint of the animal's skeleton.
arXiv Detail & Related papers (2020-09-01T07:41:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.