Stereo Object Matching Network
- URL: http://arxiv.org/abs/2103.12498v1
- Date: Tue, 23 Mar 2021 12:54:43 GMT
- Title: Stereo Object Matching Network
- Authors: Jaesung Choe, Kyungdon Joo, Francois Rameau, In So Kweon
- Abstract summary: This paper presents a stereo object matching method that exploits both 2D contextual information from images and 3D object-level information.
We present two novel strategies to handle 3D objectness in the cost volume space: selective sampling (RoISelect) and 2D-3D fusion.
- Score: 78.35697025102334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a stereo object matching method that exploits both 2D
contextual information from images as well as 3D object-level information.
Unlike existing stereo matching methods that exclusively focus on the
pixel-level correspondence between stereo images within a volumetric space
(i.e., cost volume), we exploit this volumetric structure in a different
manner. The cost volume explicitly encompasses 3D information along its
disparity axis, therefore it is a privileged structure that can encapsulate the
3D contextual information from objects. However, it is not straightforward
since the disparity values map the 3D metric space in a non-linear fashion.
Thus, we present two novel strategies to handle 3D objectness in the cost
volume space: selective sampling (RoISelect) and 2D-3D fusion
(fusion-by-occupancy), which allow us to seamlessly incorporate 3D object-level
information and achieve accurate depth performance near the object boundary
regions. Our depth estimation achieves competitive performance in the KITTI
dataset and the Virtual-KITTI 2.0 dataset.
Related papers
- CVCP-Fusion: On Implicit Depth Estimation for 3D Bounding Box Prediction [2.0375637582248136]
Cross-View Center Point-Fusion is a state-of-the-art model to perform 3D object detection.
Our architecture utilizes aspects from previously established algorithms, Cross-View Transformers and CenterPoint.
arXiv Detail & Related papers (2024-10-15T02:55:07Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - Generating Visual Spatial Description via Holistic 3D Scene
Understanding [88.99773815159345]
Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images.
With an external 3D scene extractor, we obtain the 3D objects and scene features for input images.
We construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
arXiv Detail & Related papers (2023-05-19T15:53:56Z) - Vox-E: Text-guided Voxel Editing of 3D Objects [14.88446525549421]
Large scale text-guided diffusion models have garnered significant attention due to their ability to synthesize diverse images.
We present a technique that harnesses the power of latent diffusion models for editing existing 3D objects.
arXiv Detail & Related papers (2023-03-21T17:36:36Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - OCM3D: Object-Centric Monocular 3D Object Detection [35.804542148335706]
We propose a novel object-centric voxel representation tailored for monocular 3D object detection.
Specifically, voxels are built on each object proposal, and their sizes are adaptively determined by the 3D spatial distribution of the points.
Our method outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-04-13T09:15:40Z) - Stereo CenterNet based 3D Object Detection for Autonomous Driving [2.508414661327797]
We propose a 3D object detection method using geometric information in stereo images, called Stereo CenterNet.
Stereo CenterNet predicts the four semantic key points of the 3D bounding box of the object in space and uses 2D left right boxes, 3D dimension, orientation and key points to restore the bounding box of the object in the 3D space.
Experiments conducted on the KITTI dataset show that our method achieves the best speed-accuracy trade-off compared with the state-of-the-art methods based on stereo geometry.
arXiv Detail & Related papers (2021-03-20T02:18:49Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.