Related papers: Two-stage Synthetic Supervising and Multi-view Consistency Self-supervising based Animal 3D Reconstruction by Single Image

Two-stage Synthetic Supervising and Multi-view Consistency Self-supervising based Animal 3D Reconstruction by Single Image

URL: http://arxiv.org/abs/2311.13199v3
Date: Tue, 20 Feb 2024 02:57:30 GMT
Title: Two-stage Synthetic Supervising and Multi-view Consistency Self-supervising based Animal 3D Reconstruction by Single Image
Authors: Zijian Kuang, Lihang Ying, Shi Jin, Li Cheng
Abstract summary: We propose the combination of two-stage supervised and self-supervised training to address the challenge of obtaining animal cooperation for 3D scanning. Results of our study demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative aspects of bird 3D digitization.
Score: 30.997936022365018
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pixel-aligned Implicit Function (PIFu) effectively captures subtle variations in body shape within a low-dimensional space through extensive training with human 3D scans, its application to live animals presents formidable challenges due to the difficulty of obtaining animal cooperation for 3D scanning. To address this challenge, we propose the combination of two-stage supervised and self-supervised training to address the challenge of obtaining animal cooperation for 3D scanning. In the first stage, we leverage synthetic animal models for supervised learning. This allows the model to learn from a diverse set of virtual animal instances. In the second stage, we use 2D multi-view consistency as a self-supervised training method. This further enhances the model's ability to reconstruct accurate and realistic 3D shape and texture from largely available single-view images of real animals. The results of our study demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative aspects of bird 3D digitization. The source code is available at https://github.com/kuangzijian/drifu-for-animals.

Related papers

Generative Zoo [41.65977386204797]
We introduce a pipeline that samples a diverse set of poses and shapes for a variety of mammalian quadrupeds and generates realistic images with corresponding ground-truth pose and shape parameters. We train a 3D pose and shape regressor on GenZoo, which achieves state-of-the-art performance on a real-world animal pose and shape estimation benchmark.
arXiv Detail & Related papers (2024-12-11T04:57:53Z)
3D-free meets 3D priors: Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance [61.06034736050515]
We introduce a method capable of generating camera-controlled viewpoints from a single input image. Our method excels in handling complex and diverse scenes without extensive training or additional 3D and multiview data.
arXiv Detail & Related papers (2024-08-12T13:53:40Z)
Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottleneck of modeling animals is the limited availability of training data. We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z)
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data [36.51674664590734]
We present En3D, an enhanced izable scheme for high-qualityd 3D human avatars. Unlike previous works that rely on scarce 3D datasets or limited 2D collections with imbalance viewing angles and pose priors, our approach aims to develop a zero-shot 3D capable of producing 3D humans.
arXiv Detail & Related papers (2024-01-02T12:06:31Z)
Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape [32.11280929126699]
We propose Animal3D, the first comprehensive dataset for mammal animal 3D pose and shape estimation. Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model. Based on the Animal3D dataset, we benchmark representative shape and pose estimation models at: (1) supervised learning from only the Animal3D data, (2) synthetic to real transfer from synthetically generated images, and (3) fine-tuning human pose and shape estimation models.
arXiv Detail & Related papers (2023-08-22T18:57:07Z)
AG3D: Learning to Generate 3D Avatars from 2D Image Collections [96.28021214088746]
We propose a new adversarial generative model of realistic 3D people from 2D images. Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator. We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance.
arXiv Detail & Related papers (2023-05-03T17:56:24Z)
MagicPony: Learning Articulated 3D Animals in the Wild [81.63322697335228]
We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes.
arXiv Detail & Related papers (2022-11-22T18:59:31Z)
LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery [72.3681707384754]
We propose a practical problem setting to estimate 3D pose and shape of animals given only a few in-the-wild images of a particular animal species. We do not assume any form of 2D or 3D ground-truth annotations, nor do we leverage any multi-view or temporal information. Following these insights, we propose LASSIE, a novel optimization framework which discovers 3D parts in a self-supervised manner.
arXiv Detail & Related papers (2022-07-07T17:00:07Z)
PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning. We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision. This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z)
Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses. Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.