Reconstructing Animals and the Wild
- URL: http://arxiv.org/abs/2411.18807v1
- Date: Wed, 27 Nov 2024 23:24:27 GMT
- Title: Reconstructing Animals and the Wild
- Authors: Peter Kulits, Michael J. Black, Silvia Zuffi,
- Abstract summary: We propose a method to reconstruct natural scenes from single images.
We base our approach on advances leveraging the strong world priors in Large Language Models.
We propose a synthetic dataset comprising one million images and thousands of assets.
- Score: 51.98009864071166
- License:
- Abstract: The idea of 3D reconstruction as scene understanding is foundational in computer vision. Reconstructing 3D scenes from 2D visual observations requires strong priors to disambiguate structure. Much work has been focused on the anthropocentric, which, characterized by smooth surfaces, coherent normals, and regular edges, allows for the integration of strong geometric inductive biases. Here, we consider a more challenging problem where such assumptions do not hold: the reconstruction of natural scenes containing trees, bushes, boulders, and animals. While numerous works have attempted to tackle the problem of reconstructing animals in the wild, they have focused solely on the animal, neglecting environmental context. This limits their usefulness for analysis tasks, as animals exist inherently within the 3D world, and information is lost when environmental factors are disregarded. We propose a method to reconstruct natural scenes from single images. We base our approach on recent advances leveraging the strong world priors ingrained in Large Language Models and train an autoregressive model to decode a CLIP embedding into a structured compositional scene representation, encompassing both animals and the wild (RAW). To enable this, we propose a synthetic dataset comprising one million images and thousands of assets. Our approach, having been trained solely on synthetic data, generalizes to the task of reconstructing animals and their environments in real-world images. We will release our dataset and code to encourage future research at https://raw.is.tue.mpg.de/
Related papers
- Generative Zoo [41.65977386204797]
We introduce a pipeline that samples a diverse set of poses and shapes for a variety of mammalian quadrupeds and generates realistic images with corresponding ground-truth pose and shape parameters.
We train a 3D pose and shape regressor on GenZoo, which achieves state-of-the-art performance on a real-world animal pose and shape estimation benchmark.
arXiv Detail & Related papers (2024-12-11T04:57:53Z) - Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images [21.718426435322925]
We introduce the first method using synthetic data generation and disentanglement to learn to regress 3D shape and pose.
Our method, Dessie, surpasses existing 3D horse reconstruction methods and generalizes to other large animals like zebras, cows, and deer.
arXiv Detail & Related papers (2024-10-04T13:52:22Z) - Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via
Self-supervised Scene Decomposition [40.46674919612935]
We present Vid2Avatar, a method to learn human avatars from monocular in-the-wild videos.
Our method does not require any groundtruth supervision or priors extracted from large datasets of clothed human scans.
It solves the tasks of scene decomposition and surface reconstruction directly in 3D by modeling both the human and the background in the scene jointly.
arXiv Detail & Related papers (2023-02-22T18:59:17Z) - MagicPony: Learning Articulated 3D Animals in the Wild [81.63322697335228]
We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category.
At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes.
arXiv Detail & Related papers (2022-11-22T18:59:31Z) - Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable
Categories [80.30216777363057]
We introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets.
At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views.
Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.
arXiv Detail & Related papers (2022-11-07T22:42:42Z) - 3D Clothed Human Reconstruction in the Wild [67.35107130310257]
ClothWild is a 3D clothed human reconstruction framework that addresses the robustness on in-the-wild images.
We propose a weakly supervised pipeline that is trainable with 2D supervision targets of in-the-wild datasets.
Our proposed ClothWild produces much more accurate and robust results than the state-of-the-art methods.
arXiv Detail & Related papers (2022-07-20T17:33:19Z) - LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D
Part Discovery [72.3681707384754]
We propose a practical problem setting to estimate 3D pose and shape of animals given only a few in-the-wild images of a particular animal species.
We do not assume any form of 2D or 3D ground-truth annotations, nor do we leverage any multi-view or temporal information.
Following these insights, we propose LASSIE, a novel optimization framework which discovers 3D parts in a self-supervised manner.
arXiv Detail & Related papers (2022-07-07T17:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.