Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors
- URL: http://arxiv.org/abs/2004.06302v2
- Date: Sun, 3 May 2020 01:21:59 GMT
- Title: Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors
- Authors: Mateusz Michalkiewicz, Sarah Parisot, Stavros Tsogkas, Mahsa
Baktashmotlagh, Anders Eriksson, Eugene Belilovsky
- Abstract summary: We show that complex encoder-decoder architectures perform similarly to nearest-neighbor baselines in standard benchmarks.
We propose three approaches that efficiently integrate a class prior into a 3D reconstruction model.
- Score: 30.262308825799167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The impressive performance of deep convolutional neural networks in
single-view 3D reconstruction suggests that these models perform non-trivial
reasoning about the 3D structure of the output space. However, recent work has
challenged this belief, showing that complex encoder-decoder architectures
perform similarly to nearest-neighbor baselines or simple linear decoder models
that exploit large amounts of per category data in standard benchmarks. On the
other hand settings where 3D shape must be inferred for new categories with few
examples are more natural and require models that generalize about shapes. In
this work we demonstrate experimentally that naive baselines do not apply when
the goal is to learn to reconstruct novel objects using very few examples, and
that in a \emph{few-shot} learning setting, the network must learn concepts
that can be applied to new categories, avoiding rote memorization. To address
deficiencies in existing approaches to this problem, we propose three
approaches that efficiently integrate a class prior into a 3D reconstruction
model, allowing to account for intra-class variability and imposing an implicit
compositional structure that the model should learn. Experiments on the popular
ShapeNet database demonstrate that our method significantly outperform existing
baselines on this task in the few-shot setting.
Related papers
- Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Few-shot Single-view 3D Reconstruction with Memory Prior Contrastive
Network [18.000566656946475]
3D reconstruction of novel categories based on few-shot learning is appealing in real-world applications.
We present a Memory Prior Contrastive Network (MPCN) that can store shape prior knowledge in a few-shot learning based 3D reconstruction framework.
arXiv Detail & Related papers (2022-07-30T10:49:39Z) - Stereo Neural Vernier Caliper [57.187088191829886]
We propose a new object-centric framework for learning-based stereo 3D object detection.
We tackle a problem of how to predict a refined update given an initial 3D cuboid guess.
Our approach achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2022-03-21T14:36:07Z) - Learning Compositional Shape Priors for Few-Shot 3D Reconstruction [36.40776735291117]
We show that complex encoder-decoder architectures exploit large amounts of per-category data.
We propose three ways to learn a class-specific global shape prior, directly from data.
Experiments on the popular ShapeNet dataset show that our method outperforms a zero-shot baseline by over 40%.
arXiv Detail & Related papers (2021-06-11T14:55:49Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from
a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives.
Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives.
Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z) - Convolutional Occupancy Networks [88.48287716452002]
We propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes.
By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space.
We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
arXiv Detail & Related papers (2020-03-10T10:17:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.