Shape and Viewpoint without Keypoints
- URL: http://arxiv.org/abs/2007.10982v1
- Date: Tue, 21 Jul 2020 17:58:28 GMT
- Title: Shape and Viewpoint without Keypoints
- Authors: Shubham Goel, Angjoo Kanazawa, Jitendra Malik
- Abstract summary: We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
- Score: 63.26977130704171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a learning framework that learns to recover the 3D shape, pose and
texture from a single image, trained on an image collection without any ground
truth 3D shape, multi-view, camera viewpoints or keypoint supervision. We
approach this highly under-constrained problem in a "analysis by synthesis"
framework where the goal is to predict the likely shape, texture and camera
viewpoint that could produce the image with various learned category-specific
priors. Our particular contribution in this paper is a representation of the
distribution over cameras, which we call "camera-multiplex". Instead of picking
a point estimate, we maintain a set of camera hypotheses that are optimized
during training to best explain the image given the current shape and texture.
We call our approach Unsupervised Category-Specific Mesh Reconstruction
(U-CMR), and present qualitative and quantitative results on CUB, Pascal 3D and
new web-scraped datasets. We obtain state-of-the-art camera prediction results
and show that we can learn to predict diverse shapes and textures across
objects using an image collection without any keypoint annotations or 3D ground
truth. Project page: https://shubham-goel.github.io/ucmr
Related papers
- Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops [17.074716363691294]
Models for predicting 3D from a single image often work with crops around the object of interest and ignore the location of the object in the camera's field of view.
We propose Intrinsics-Aware Positional.
benchmarks (KPE), which incorporates information about the location of crops in the image and camera shapes.
Experiments on three popular 3D-from-a-single-image benchmarks: depth prediction on NYU, 3D object detection on KITTI & nuScenes, and predicting 3D of articulated objects on ARCTIC, show the benefits of KPE.
arXiv Detail & Related papers (2023-12-11T18:28:55Z) - PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction [77.89935657608926]
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images.
PF-LRM simultaneously estimates the relative camera poses in 1.3 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-11-20T18:57:55Z) - 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - From Image Collections to Point Clouds with Self-supervised Shape and
Pose Networks [53.71440550507745]
Reconstructing 3D models from 2D images is one of the fundamental problems in computer vision.
We propose a deep learning technique for 3D object reconstruction from a single image.
We learn both 3D point cloud reconstruction and pose estimation networks in a self-supervised manner.
arXiv Detail & Related papers (2020-05-05T04:25:16Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.