3D Object Recognition By Corresponding and Quantizing Neural 3D Scene
Representations
- URL: http://arxiv.org/abs/2010.16279v1
- Date: Fri, 30 Oct 2020 13:56:09 GMT
- Title: 3D Object Recognition By Corresponding and Quantizing Neural 3D Scene
Representations
- Authors: Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley,
Shubhankar Potdar, Katerina Fragkiadaki
- Abstract summary: We propose a system that learns to detect objects and infer their 3D poses in RGB-D images.
Many existing systems can identify objects and infer 3D poses, but they heavily rely on human labels and 3D annotations.
- Score: 29.61554189447989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a system that learns to detect objects and infer their 3D poses in
RGB-D images. Many existing systems can identify objects and infer 3D poses,
but they heavily rely on human labels and 3D annotations. The challenge here is
to achieve this without relying on strong supervision signals. To address this
challenge, we propose a model that maps RGB-D images to a set of 3D visual
feature maps in a differentiable fully-convolutional manner, supervised by
predicting views. The 3D feature maps correspond to a featurization of the 3D
world scene depicted in the images. The object 3D feature representations are
invariant to camera viewpoint changes or zooms, which means feature matching
can identify similar objects under different camera viewpoints. We can compare
the 3D feature maps of two objects by searching alignment across scales and 3D
rotations, and, as a result of the operation, we can estimate pose and scale
changes without the need for 3D pose annotations. We cluster object feature
maps into a set of 3D prototypes that represent familiar objects in canonical
scales and orientations. We then parse images by inferring the prototype
identity and 3D pose for each detected object. We compare our method to
numerous baselines that do not learn 3D feature visual representations or do
not attempt to correspond features across scenes, and outperform them by a
large margin in the tasks of object retrieval and object pose estimation.
Thanks to the 3D nature of the object-centric feature maps, the visual
similarity cues are invariant to 3D pose changes or small scale changes, which
gives our method an advantage over 2D and 1D methods.
Related papers
- ImageNet3D: Towards General-Purpose Object-Level 3D Understanding [20.837297477080945]
We present ImageNet3D, a large dataset for general-purpose object-level 3D understanding.
ImageNet3D augments 200 categories from the ImageNet dataset with 2D bounding box, 3D pose, 3D location annotations, and image captions interleaved with 3D information.
We consider two new tasks, probing of object-level 3D awareness and open vocabulary pose estimation, besides standard classification and pose estimation.
arXiv Detail & Related papers (2024-06-13T22:44:26Z) - Generating Visual Spatial Description via Holistic 3D Scene
Understanding [88.99773815159345]
Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images.
With an external 3D scene extractor, we obtain the 3D objects and scene features for input images.
We construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
arXiv Detail & Related papers (2023-05-19T15:53:56Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - Point2Seq: Detecting 3D Objects as Sequences [58.63662049729309]
We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds.
We view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner.
arXiv Detail & Related papers (2022-03-25T00:20:31Z) - End-to-End Learning of Multi-category 3D Pose and Shape Estimation [128.881857704338]
We propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D.
The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations.
In addition to being end-to-end in image to 3D learning, our method also handles objects from multiple categories using a single neural network.
arXiv Detail & Related papers (2021-12-19T17:10:40Z) - Voxel-based 3D Detection and Reconstruction of Multiple Objects from a
Single Image [22.037472446683765]
We learn a regular grid of 3D voxel features from the input image which is aligned with 3D scene space via a 3D feature lifting operator.
Based on the 3D voxel features, our novel CenterNet-3D detection head formulates the 3D detection as keypoint detection in the 3D space.
We devise an efficient coarse-to-fine reconstruction module, including coarse-level voxelization and a novel local PCA-SDF shape representation.
arXiv Detail & Related papers (2021-11-04T18:30:37Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - CoCoNets: Continuous Contrastive 3D Scene Representations [21.906643302668716]
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos.
We show the resulting 3D visual feature representations effectively scale across objects and scenes, imagine information occluded or missing from the input viewpoints, track objects over time, align semantically related objects in 3D, and improve 3D object detection.
arXiv Detail & Related papers (2021-04-08T15:50:47Z) - Disentangling 3D Prototypical Networks For Few-Shot Concept Learning [29.02523358573336]
We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene.
Our networks incorporate architectural biases that reflect the image formation process, 3D geometry of the world scene, and shape-style interplay.
arXiv Detail & Related papers (2020-11-06T14:08:27Z) - 3D Object Detection and Pose Estimation of Unseen Objects in Color
Images with Local Surface Embeddings [35.769234123059086]
We present an approach for detecting and estimating the 3D poses of objects in images that requires only an untextured CAD model.
Our approach combines Deep Learning and 3D geometry: It relies on an embedding of local 3D geometry to match the CAD models to the input images.
We show that we can use Mask-RCNN in a class-agnostic way to detect the new objects without retraining and thus drastically limit the number of possible correspondences.
arXiv Detail & Related papers (2020-10-08T15:57:06Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.