Unsupervised Learning of 3D Object Categories from Videos in the Wild
- URL: http://arxiv.org/abs/2103.16552v1
- Date: Tue, 30 Mar 2021 17:57:01 GMT
- Title: Unsupervised Learning of 3D Object Categories from Videos in the Wild
- Authors: Philipp Henzler, Jeremy Reizenstein, Patrick Labatut, Roman
Shapovalov, Tobias Ritschel, Andrea Vedaldi, David Novotny
- Abstract summary: We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
- Score: 75.09720013151247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our goal is to learn a deep network that, given a small number of images of
an object of a given category, reconstructs it in 3D. While several recent
works have obtained analogous results using synthetic data or assuming the
availability of 2D primitives such as keypoints, we are interested in working
with challenging real data and with no manual annotations. We thus focus on
learning a model from multiple views of a large collection of object instances.
We contribute with a new large dataset of object centric videos suitable for
training and benchmarking this class of models. We show that existing
techniques leveraging meshes, voxels, or implicit surfaces, which work well for
reconstructing isolated objects, fail on this challenging data. Finally, we
propose a new neural network design, called warp-conditioned ray embedding
(WCR), which significantly improves reconstruction while obtaining a detailed
implicit representation of the object surface and texture, also compensating
for the noise in the initial SfM reconstruction that bootstrapped the learning
process. Our evaluation demonstrates performance improvements over several deep
monocular reconstruction baselines on existing benchmarks and on our novel
dataset.
Related papers
- MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Learning Compositional Shape Priors for Few-Shot 3D Reconstruction [36.40776735291117]
We show that complex encoder-decoder architectures exploit large amounts of per-category data.
We propose three ways to learn a class-specific global shape prior, directly from data.
Experiments on the popular ShapeNet dataset show that our method outperforms a zero-shot baseline by over 40%.
arXiv Detail & Related papers (2021-06-11T14:55:49Z) - SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and
3D Mesh Reconstruction from Video Data [124.2624568006391]
We present SAIL-VOS 3D: a synthetic video dataset with frame-by-frame mesh annotations.
We also develop first baselines for reconstruction of 3D meshes from video data via temporal models.
arXiv Detail & Related papers (2021-05-18T15:42:37Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors [30.262308825799167]
We show that complex encoder-decoder architectures perform similarly to nearest-neighbor baselines in standard benchmarks.
We propose three approaches that efficiently integrate a class prior into a 3D reconstruction model.
arXiv Detail & Related papers (2020-04-14T04:53:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.