OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object
Detection
- URL: http://arxiv.org/abs/2312.08876v1
- Date: Tue, 12 Dec 2023 07:49:30 GMT
- Title: OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object
Detection
- Authors: Hu Zhang, Jianhua Xu, Tao Tang, Haiyang Sun, Xin Yu, Zi Huang,
Kaicheng Yu
- Abstract summary: OpenSight is a more advanced 2D-3D modeling framework for LiDAR-based open-vocabulary detection.
Our method establishes state-of-the-art open-vocabulary performance on widely used 3D detection benchmarks.
- Score: 41.24059083441953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional LiDAR-based object detection research primarily focuses on
closed-set scenarios, which falls short in complex real-world applications.
Directly transferring existing 2D open-vocabulary models with some known LiDAR
classes for open-vocabulary ability, however, tends to suffer from over-fitting
problems: The obtained model will detect the known objects, even presented with
a novel category. In this paper, we propose OpenSight, a more advanced 2D-3D
modeling framework for LiDAR-based open-vocabulary detection. OpenSight
utilizes 2D-3D geometric priors for the initial discernment and localization of
generic objects, followed by a more specific semantic interpretation of the
detected objects. The process begins by generating 2D boxes for generic objects
from the accompanying camera images of LiDAR. These 2D boxes, together with
LiDAR points, are then lifted back into the LiDAR space to estimate
corresponding 3D boxes. For better generic object perception, our framework
integrates both temporal and spatial-aware constraints. Temporal awareness
correlates the predicted 3D boxes across consecutive timestamps, recalibrating
the missed or inaccurate boxes. The spatial awareness randomly places some
``precisely'' estimated 3D boxes at varying distances, increasing the
visibility of generic objects. To interpret the specific semantics of detected
objects, we develop a cross-modal alignment and fusion module to first align 3D
features with 2D image embeddings and then fuse the aligned 3D-2D features for
semantic decoding. Our experiments indicate that our method establishes
state-of-the-art open-vocabulary performance on widely used 3D detection
benchmarks and effectively identifies objects for new categories of interest.
Related papers
- Open Vocabulary Monocular 3D Object Detection [10.424711580213616]
We pioneer the study of open-vocabulary monocular 3D object detection, a novel task that aims to detect and localize objects in 3D space from a single RGB image.
We introduce a class-agnostic approach that leverages open-vocabulary 2D detectors and lifts 2D bounding boxes into 3D space.
Our approach decouples the recognition and localization of objects in 2D from the task of estimating 3D bounding boxes, enabling generalization across unseen categories.
arXiv Detail & Related papers (2024-11-25T18:59:17Z) - Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det.
OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes.
It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z) - General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection [34.91703960513125]
CoDAv2 is a unified framework designed to tackle both the localization and classification of novel 3D objects.
CoDAv2 outperforms the best-performing method by a large margin.
Source code and pre-trained models are available at the GitHub project page.
arXiv Detail & Related papers (2024-06-02T18:32:37Z) - OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - Improving Distant 3D Object Detection Using 2D Box Supervision [97.80225758259147]
We propose LR3D, a framework that learns to recover the missing depth of distant objects.
Our framework is general, and could widely benefit 3D detection methods to a large extent.
arXiv Detail & Related papers (2024-03-14T09:54:31Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.