Two-stage Synthetic Supervising and Multi-view Consistency
Self-supervising based Animal 3D Reconstruction by Single Image
- URL: http://arxiv.org/abs/2311.13199v3
- Date: Tue, 20 Feb 2024 02:57:30 GMT
- Title: Two-stage Synthetic Supervising and Multi-view Consistency
Self-supervising based Animal 3D Reconstruction by Single Image
- Authors: Zijian Kuang, Lihang Ying, Shi Jin, Li Cheng
- Abstract summary: We propose the combination of two-stage supervised and self-supervised training to address the challenge of obtaining animal cooperation for 3D scanning.
Results of our study demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative aspects of bird 3D digitization.
- Score: 30.997936022365018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pixel-aligned Implicit Function (PIFu) effectively captures subtle variations
in body shape within a low-dimensional space through extensive training with
human 3D scans, its application to live animals presents formidable challenges
due to the difficulty of obtaining animal cooperation for 3D scanning. To
address this challenge, we propose the combination of two-stage supervised and
self-supervised training to address the challenge of obtaining animal
cooperation for 3D scanning. In the first stage, we leverage synthetic animal
models for supervised learning. This allows the model to learn from a diverse
set of virtual animal instances. In the second stage, we use 2D multi-view
consistency as a self-supervised training method. This further enhances the
model's ability to reconstruct accurate and realistic 3D shape and texture from
largely available single-view images of real animals. The results of our study
demonstrate that our approach outperforms state-of-the-art methods in both
quantitative and qualitative aspects of bird 3D digitization. The source code
is available at https://github.com/kuangzijian/drifu-for-animals.
Related papers
- Generative Zoo [41.65977386204797]
We introduce a pipeline that samples a diverse set of poses and shapes for a variety of mammalian quadrupeds and generates realistic images with corresponding ground-truth pose and shape parameters.
We train a 3D pose and shape regressor on GenZoo, which achieves state-of-the-art performance on a real-world animal pose and shape estimation benchmark.
arXiv Detail & Related papers (2024-12-11T04:57:53Z) - 3D-free meets 3D priors: Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance [61.06034736050515]
We introduce a method capable of generating camera-controlled viewpoints from a single input image.
Our method excels in handling complex and diverse scenes without extensive training or additional 3D and multiview data.
arXiv Detail & Related papers (2024-08-12T13:53:40Z) - Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape [32.11280929126699]
We propose Animal3D, the first comprehensive dataset for mammal animal 3D pose and shape estimation.
Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model.
Based on the Animal3D dataset, we benchmark representative shape and pose estimation models at: (1) supervised learning from only the Animal3D data, (2) synthetic to real transfer from synthetically generated images, and (3) fine-tuning human pose and shape estimation models.
arXiv Detail & Related papers (2023-08-22T18:57:07Z) - AG3D: Learning to Generate 3D Avatars from 2D Image Collections [96.28021214088746]
We propose a new adversarial generative model of realistic 3D people from 2D images.
Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator.
We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance.
arXiv Detail & Related papers (2023-05-03T17:56:24Z) - MagicPony: Learning Articulated 3D Animals in the Wild [81.63322697335228]
We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category.
At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes.
arXiv Detail & Related papers (2022-11-22T18:59:31Z) - LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D
Part Discovery [72.3681707384754]
We propose a practical problem setting to estimate 3D pose and shape of animals given only a few in-the-wild images of a particular animal species.
We do not assume any form of 2D or 3D ground-truth annotations, nor do we leverage any multi-view or temporal information.
Following these insights, we propose LASSIE, a novel optimization framework which discovers 3D parts in a self-supervised manner.
arXiv Detail & Related papers (2022-07-07T17:00:07Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.