OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object
Detection
- URL: http://arxiv.org/abs/2312.08876v1
- Date: Tue, 12 Dec 2023 07:49:30 GMT
- Title: OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object
Detection
- Authors: Hu Zhang, Jianhua Xu, Tao Tang, Haiyang Sun, Xin Yu, Zi Huang,
Kaicheng Yu
- Abstract summary: OpenSight is a more advanced 2D-3D modeling framework for LiDAR-based open-vocabulary detection.
Our method establishes state-of-the-art open-vocabulary performance on widely used 3D detection benchmarks.
- Score: 41.24059083441953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional LiDAR-based object detection research primarily focuses on
closed-set scenarios, which falls short in complex real-world applications.
Directly transferring existing 2D open-vocabulary models with some known LiDAR
classes for open-vocabulary ability, however, tends to suffer from over-fitting
problems: The obtained model will detect the known objects, even presented with
a novel category. In this paper, we propose OpenSight, a more advanced 2D-3D
modeling framework for LiDAR-based open-vocabulary detection. OpenSight
utilizes 2D-3D geometric priors for the initial discernment and localization of
generic objects, followed by a more specific semantic interpretation of the
detected objects. The process begins by generating 2D boxes for generic objects
from the accompanying camera images of LiDAR. These 2D boxes, together with
LiDAR points, are then lifted back into the LiDAR space to estimate
corresponding 3D boxes. For better generic object perception, our framework
integrates both temporal and spatial-aware constraints. Temporal awareness
correlates the predicted 3D boxes across consecutive timestamps, recalibrating
the missed or inaccurate boxes. The spatial awareness randomly places some
``precisely'' estimated 3D boxes at varying distances, increasing the
visibility of generic objects. To interpret the specific semantics of detected
objects, we develop a cross-modal alignment and fusion module to first align 3D
features with 2D image embeddings and then fuse the aligned 3D-2D features for
semantic decoding. Our experiments indicate that our method establishes
state-of-the-art open-vocabulary performance on widely used 3D detection
benchmarks and effectively identifies objects for new categories of interest.
Related papers
- General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection [34.91703960513125]
CoDAv2 is a unified framework designed to tackle both the localization and classification of novel 3D objects.
CoDAv2 outperforms the best-performing method by a large margin.
Source code and pre-trained models are available at the GitHub project page.
arXiv Detail & Related papers (2024-06-02T18:32:37Z) - OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - Improving Distant 3D Object Detection Using 2D Box Supervision [97.80225758259147]
We propose LR3D, a framework that learns to recover the missing depth of distant objects.
Our framework is general, and could widely benefit 3D detection methods to a large extent.
arXiv Detail & Related papers (2024-03-14T09:54:31Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection [15.244852122106634]
We propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework.
Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain.
For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed.
arXiv Detail & Related papers (2021-08-25T08:50:06Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.