Explicit3D: Graph Network with Spatial Inference for Single Image 3D
Object Detection
- URL: http://arxiv.org/abs/2302.06494v3
- Date: Mon, 20 Nov 2023 08:44:23 GMT
- Title: Explicit3D: Graph Network with Spatial Inference for Single Image 3D
Object Detection
- Authors: Yanjun Liu and Wenming Yang
- Abstract summary: We propose a dynamic sparse graph pipeline named Explicit3D based on object geometry and semantics features.
Our experimental results on the SUN RGB-D dataset demonstrate that our Explicit3D achieves better performance balance than the-state-of-the-art.
- Score: 35.85544715234846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Indoor 3D object detection is an essential task in single image scene
understanding, impacting spatial cognition fundamentally in visual reasoning.
Existing works on 3D object detection from a single image either pursue this
goal through independent predictions of each object or implicitly reason over
all possible objects, failing to harness relational geometric information
between objects. To address this problem, we propose a dynamic sparse graph
pipeline named Explicit3D based on object geometry and semantics features.
Taking the efficiency into consideration, we further define a relatedness score
and design a novel dynamic pruning algorithm followed by a cluster sampling
method for sparse scene graph generation and updating. Furthermore, our
Explicit3D introduces homogeneous matrices and defines new relative loss and
corner loss to model the spatial difference between target pairs explicitly.
Instead of using ground-truth labels as direct supervision, our relative and
corner loss are derived from the homogeneous transformation, which renders the
model to learn the geometric consistency between objects. The experimental
results on the SUN RGB-D dataset demonstrate that our Explicit3D achieves
better performance balance than the-state-of-the-art.
Related papers
- Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering [35.69582529609475]
This paper introduces a novel neural algorithm for parameterizing neural implicit surfaces to simple parametric domains like spheres and polycubes.
It computes bi-directional deformation between the object and the domain using a forward mapping from the object's zero level set and an inverse deformation for backward mapping.
We demonstrate the method's effectiveness on images of human heads and man-made objects.
arXiv Detail & Related papers (2023-10-09T08:42:40Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - Object DGCNN: 3D Object Detection using Dynamic Graphs [32.090268859180334]
3D object detection often involves complicated training and testing pipelines.
Inspired by recent non-maximum suppression-free 2D object detection models, we propose a 3D object detection architecture on point clouds.
arXiv Detail & Related papers (2021-10-13T17:59:38Z) - Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving
Objects [115.71874459429381]
We address the novel task of jointly reconstructing the 3D shape, texture, and motion of an object from a single motion-blurred image.
While previous approaches address the deblurring problem only in the 2D image domain, our proposed rigorous modeling of all object properties in the 3D domain enables the correct description of arbitrary object motion.
arXiv Detail & Related papers (2021-06-16T13:18:08Z) - Joint Deep Multi-Graph Matching and 3D Geometry Learning from
Inhomogeneous 2D Image Collections [57.60094385551773]
We propose a trainable framework for learning a deformable 3D geometry model from inhomogeneous image collections.
We in addition obtain the underlying 3D geometry of the objects depicted in the 2D images.
arXiv Detail & Related papers (2021-03-31T17:25:36Z) - Monocular 3D Detection with Geometric Constraints Embedding and
Semi-supervised Training [3.8073142980733]
We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net.
We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
arXiv Detail & Related papers (2020-09-02T00:51:51Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.