On the generalization of learning-based 3D reconstruction
- URL: http://arxiv.org/abs/2006.15427v1
- Date: Sat, 27 Jun 2020 18:53:41 GMT
- Title: On the generalization of learning-based 3D reconstruction
- Authors: Miguel Angel Bautista, Walter Talbott, Shuangfei Zhai, Nitish
Srivastava, Joshua M Susskind
- Abstract summary: We study the inductive biases encoded in the model architecture that impact the generalization of learning-based 3D reconstruction methods.
We find that 3 inductive biases impact performance: the spatial extent of the encoder, the use of the underlying geometry of the scene to describe point features, and the mechanism to aggregate information from multiple views.
- Score: 10.516860541554632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art learning-based monocular 3D reconstruction methods learn
priors over object categories on the training set, and as a result struggle to
achieve reasonable generalization to object categories unseen during training.
In this paper we study the inductive biases encoded in the model architecture
that impact the generalization of learning-based 3D reconstruction methods. We
find that 3 inductive biases impact performance: the spatial extent of the
encoder, the use of the underlying geometry of the scene to describe point
features, and the mechanism to aggregate information from multiple views.
Additionally, we propose mechanisms to enforce those inductive biases: a point
representation that is aware of camera position, and a variance cost to
aggregate information across views. Our model achieves state-of-the-art results
on the standard ShapeNet 3D reconstruction benchmark in various settings.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D
Reconstruction [47.38670633513938]
We learn a unified model for single-view 3D reconstruction of objects from hundreds of semantic categories.
Our work relies on segmented image collections for learning 3D of generic categories.
arXiv Detail & Related papers (2022-04-07T17:59:25Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z) - Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors [30.262308825799167]
We show that complex encoder-decoder architectures perform similarly to nearest-neighbor baselines in standard benchmarks.
We propose three approaches that efficiently integrate a class prior into a 3D reconstruction model.
arXiv Detail & Related papers (2020-04-14T04:53:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.