LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D
Part Discovery
- URL: http://arxiv.org/abs/2207.03434v1
- Date: Thu, 7 Jul 2022 17:00:07 GMT
- Title: LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D
Part Discovery
- Authors: Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein,
Ming-Hsuan Yang, Varun Jampani
- Abstract summary: We propose a practical problem setting to estimate 3D pose and shape of animals given only a few in-the-wild images of a particular animal species.
We do not assume any form of 2D or 3D ground-truth annotations, nor do we leverage any multi-view or temporal information.
Following these insights, we propose LASSIE, a novel optimization framework which discovers 3D parts in a self-supervised manner.
- Score: 72.3681707384754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creating high-quality articulated 3D models of animals is challenging either
via manual creation or using 3D scanning tools. Therefore, techniques to
reconstruct articulated 3D objects from 2D images are crucial and highly
useful. In this work, we propose a practical problem setting to estimate 3D
pose and shape of animals given only a few (10-30) in-the-wild images of a
particular animal species (say, horse). Contrary to existing works that rely on
pre-defined template shapes, we do not assume any form of 2D or 3D ground-truth
annotations, nor do we leverage any multi-view or temporal information.
Moreover, each input image ensemble can contain animal instances with varying
poses, backgrounds, illuminations, and textures. Our key insight is that 3D
parts have much simpler shape compared to the overall animal and that they are
robust w.r.t. animal pose articulations. Following these insights, we propose
LASSIE, a novel optimization framework which discovers 3D parts in a
self-supervised manner with minimal user intervention. A key driving force
behind LASSIE is the enforcing of 2D-3D part consistency using self-supervisory
deep features. Experiments on Pascal-Part and self-collected in-the-wild animal
datasets demonstrate considerably better 3D reconstructions as well as both 2D
and 3D part discovery compared to prior arts. Project page:
chhankyao.github.io/lassie/
Related papers
- Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - Two-stage Synthetic Supervising and Multi-view Consistency
Self-supervising based Animal 3D Reconstruction by Single Image [30.997936022365018]
We propose the combination of two-stage supervised and self-supervised training to address the challenge of obtaining animal cooperation for 3D scanning.
Results of our study demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative aspects of bird 3D digitization.
arXiv Detail & Related papers (2023-11-22T07:06:38Z) - Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape [32.11280929126699]
We propose Animal3D, the first comprehensive dataset for mammal animal 3D pose and shape estimation.
Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model.
Based on the Animal3D dataset, we benchmark representative shape and pose estimation models at: (1) supervised learning from only the Animal3D data, (2) synthetic to real transfer from synthetically generated images, and (3) fine-tuning human pose and shape estimation models.
arXiv Detail & Related papers (2023-08-22T18:57:07Z) - ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z) - AG3D: Learning to Generate 3D Avatars from 2D Image Collections [96.28021214088746]
We propose a new adversarial generative model of realistic 3D people from 2D images.
Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator.
We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance.
arXiv Detail & Related papers (2023-05-03T17:56:24Z) - EVA3D: Compositional 3D Human Generation from 2D Image Collections [27.70991135165909]
EVA3D is an unconditional 3D human generative model learned from 2D image collections only.
It can sample 3D humans with detailed geometry and render high-quality images (up to 512x256) without bells and whistles.
It achieves state-of-the-art 3D human generation performance regarding both geometry and texture quality.
arXiv Detail & Related papers (2022-10-10T17:59:31Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D
Image GANs [156.1209884183522]
State-of-the-art 2D generative models like GANs show unprecedented quality in modeling the natural image manifold.
We present the first attempt to directly mine 3D geometric cues from an off-the-shelf 2D GAN that is trained on RGB images only.
arXiv Detail & Related papers (2020-11-02T09:38:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.