Diverse Plausible Shape Completions from Ambiguous Depth Images
- URL: http://arxiv.org/abs/2011.09390v1
- Date: Wed, 18 Nov 2020 16:42:51 GMT
- Title: Diverse Plausible Shape Completions from Ambiguous Depth Images
- Authors: Brad Saund and Dmitry Berenson
- Abstract summary: PSSNet is a network architecture for generating plausible 3D reconstructions from a single 2.5D depth image.
We perform experiments using Shapenet mugs and partially-occluded YCB objects and find that our method performs comparably in datasets with little ambiguity.
- Score: 7.652701739127332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose PSSNet, a network architecture for generating diverse plausible 3D
reconstructions from a single 2.5D depth image. Existing methods tend to
produce only small variations on a single shape, even when multiple shapes are
consistent with an observation. To obtain diversity we alter a Variational Auto
Encoder by providing a learned shape bounding box feature as side information
during training. Since these features are known during training, we are able to
add a supervised loss to the encoder and noiseless values to the decoder. To
evaluate, we sample a set of completions from a network, construct a set of
plausible shape matches for each test observation, and compare using our
plausible diversity metric defined over sets of shapes. We perform experiments
using Shapenet mugs and partially-occluded YCB objects and find that our method
performs comparably in datasets with little ambiguity, and outperforms existing
methods when many shapes plausibly fit an observed depth image. We demonstrate
one use for PSSNet on a physical robot when grasping objects in occlusion and
clutter.
Related papers
- Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning [61.14132533712537]
We propose MAL-SPC, a framework that effectively leverages both object-level and category-specific geometric similarities to complete missing structures.
Our MAL-SPC does not require any 3D complete supervision and only necessitates a single partial point cloud for each object.
arXiv Detail & Related papers (2024-07-13T06:53:39Z) - 3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets [34.610546020800236]
3DMiner is a pipeline for mining 3D shapes from challenging datasets.
Our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques.
We show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset.
arXiv Detail & Related papers (2023-10-29T23:08:19Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Zero in on Shape: A Generic 2D-3D Instance Similarity Metric learned
from Synthetic Data [3.71630298053787]
We present a network architecture which compares RGB images and untextured 3D models by the similarity of the represented shape.
Our system is optimised for zero-shot retrieval, meaning it can recognise shapes never shown in training.
arXiv Detail & Related papers (2021-08-09T14:44:08Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes [54.239416488865565]
We propose a fast single-stage 3D object detection method for LIDAR data.
The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes.
We find that our proposed method achieves state-of-the-art results by 5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Open dataset.
arXiv Detail & Related papers (2020-04-02T17:48:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.