Towards 3D Objectness Learning in an Open World
- URL: http://arxiv.org/abs/2510.17686v1
- Date: Mon, 20 Oct 2025 16:01:20 GMT
- Title: Towards 3D Objectness Learning in an Open World
- Authors: Taichi Liu, Zhenyu Wang, Ruofeng Liu, Guang Wang, Desheng Zhang,
- Abstract summary: We propose OP3Det, a class-agnostic Open-World Prompt-free 3D Detector to detect objects within 3D scenes without relying on hand-crafted text prompts.<n> OP3Det significantly surpasses existing open-world 3D detectors by up to 16.4% in AR and achieves a 13.5% improvement compared to closed-world 3D detectors.
- Score: 19.994404833308092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in 3D object detection and novel category detection have made significant progress, yet research on learning generalized 3D objectness remains insufficient. In this paper, we delve into learning open-world 3D objectness, which focuses on detecting all objects in a 3D scene, including novel objects unseen during training. Traditional closed-set 3D detectors struggle to generalize to open-world scenarios, while directly incorporating 3D open-vocabulary models for open-world ability struggles with vocabulary expansion and semantic overlap. To achieve generalized 3D object discovery, We propose OP3Det, a class-agnostic Open-World Prompt-free 3D Detector to detect any objects within 3D scenes without relying on hand-crafted text prompts. We introduce the strong generalization and zero-shot capabilities of 2D foundation models, utilizing both 2D semantic priors and 3D geometric priors for class-agnostic proposals to broaden 3D object discovery. Then, by integrating complementary information from point cloud and RGB image in the cross-modal mixture of experts, OP3Det dynamically routes uni-modal and multi-modal features to learn generalized 3D objectness. Extensive experiments demonstrate the extraordinary performance of OP3Det, which significantly surpasses existing open-world 3D detectors by up to 16.0% in AR and achieves a 13.5% improvement compared to closed-world 3D detectors.
Related papers
- N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models [45.008146973701855]
N3D-VLM is a novel unified framework that seamlessly integrates native 3D object perception with 3D-aware visual reasoning.<n>Unlike conventional end-to-end models that directly predict answers from RGB/RGB-D inputs, our approach equips the model with native 3D object perception capabilities.
arXiv Detail & Related papers (2025-12-18T14:03:44Z) - 3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection [62.57179069154312]
We introduce the first end-to-end 3D Monocular Open-set Object Detector (3D-MOOD)<n>We lift the open-set 2D detection into 3D space through our designed 3D bounding box head.<n>We condition the object queries with geometry prior and overcome the generalization for 3D estimation across diverse scenes.
arXiv Detail & Related papers (2025-07-31T13:56:41Z) - Detect Anything 3D in the Wild [34.293450721860616]
We introduce DetAny3D, a promptable 3D detection foundation model capable of detecting any novel object under arbitrary camera configurations.<n>To effectively transfer 2D knowledge to 3D, DetAny3D incorporates two core modules: the 2D Aggregator and the 3D Interpreter with Zero-Embedding Mapping.<n> Experimental results validate the strong generalization of our DetAny3D, which achieves state-of-the-art performance on unseen categories and novel camera configurations.
arXiv Detail & Related papers (2025-04-10T17:59:22Z) - Unifying 2D and 3D Vision-Language Understanding [85.84054120018625]
We introduce UniVLG, a unified architecture for 2D and 3D vision-language learning.<n>UniVLG bridges the gap between existing 2D-centric models and the rich 3D sensory data available in embodied systems.
arXiv Detail & Related papers (2025-03-13T17:56:22Z) - Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det.
OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes.
It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z) - Uni3D: Exploring Unified 3D Representation at Scale [66.26710717073372]
We present Uni3D, a 3D foundation model to explore the unified 3D representation at scale.
Uni3D uses a 2D ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features.
We show that the strong Uni3D representation also enables applications such as 3D painting and retrieval in the wild.
arXiv Detail & Related papers (2023-10-10T16:49:21Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.