Two-stage Synthetic Supervising and Multi-view Consistency
Self-supervising based Animal 3D Reconstruction by Single Image
- URL: http://arxiv.org/abs/2311.13199v3
- Date: Tue, 20 Feb 2024 02:57:30 GMT
- Title: Two-stage Synthetic Supervising and Multi-view Consistency
Self-supervising based Animal 3D Reconstruction by Single Image
- Authors: Zijian Kuang, Lihang Ying, Shi Jin, Li Cheng
- Abstract summary: We propose the combination of two-stage supervised and self-supervised training to address the challenge of obtaining animal cooperation for 3D scanning.
Results of our study demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative aspects of bird 3D digitization.
- Score: 30.997936022365018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pixel-aligned Implicit Function (PIFu) effectively captures subtle variations
in body shape within a low-dimensional space through extensive training with
human 3D scans, its application to live animals presents formidable challenges
due to the difficulty of obtaining animal cooperation for 3D scanning. To
address this challenge, we propose the combination of two-stage supervised and
self-supervised training to address the challenge of obtaining animal
cooperation for 3D scanning. In the first stage, we leverage synthetic animal
models for supervised learning. This allows the model to learn from a diverse
set of virtual animal instances. In the second stage, we use 2D multi-view
consistency as a self-supervised training method. This further enhances the
model's ability to reconstruct accurate and realistic 3D shape and texture from
largely available single-view images of real animals. The results of our study
demonstrate that our approach outperforms state-of-the-art methods in both
quantitative and qualitative aspects of bird 3D digitization. The source code
is available at https://github.com/kuangzijian/drifu-for-animals.
Related papers
- Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D
Synthetic Data [36.51674664590734]
We present En3D, an enhanced izable scheme for high-qualityd 3D human avatars.
Unlike previous works that rely on scarce 3D datasets or limited 2D collections with imbalance viewing angles and pose priors, our approach aims to develop a zero-shot 3D capable of producing 3D humans.
arXiv Detail & Related papers (2024-01-02T12:06:31Z) - Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape [32.11280929126699]
We propose Animal3D, the first comprehensive dataset for mammal animal 3D pose and shape estimation.
Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model.
Based on the Animal3D dataset, we benchmark representative shape and pose estimation models at: (1) supervised learning from only the Animal3D data, (2) synthetic to real transfer from synthetically generated images, and (3) fine-tuning human pose and shape estimation models.
arXiv Detail & Related papers (2023-08-22T18:57:07Z) - AG3D: Learning to Generate 3D Avatars from 2D Image Collections [96.28021214088746]
We propose a new adversarial generative model of realistic 3D people from 2D images.
Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator.
We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance.
arXiv Detail & Related papers (2023-05-03T17:56:24Z) - MagicPony: Learning Articulated 3D Animals in the Wild [81.63322697335228]
We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category.
At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes.
arXiv Detail & Related papers (2022-11-22T18:59:31Z) - LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D
Part Discovery [72.3681707384754]
We propose a practical problem setting to estimate 3D pose and shape of animals given only a few in-the-wild images of a particular animal species.
We do not assume any form of 2D or 3D ground-truth annotations, nor do we leverage any multi-view or temporal information.
Following these insights, we propose LASSIE, a novel optimization framework which discovers 3D parts in a self-supervised manner.
arXiv Detail & Related papers (2022-07-07T17:00:07Z) - PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.