DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF
- URL: http://arxiv.org/abs/2507.14596v1
- Date: Sat, 19 Jul 2025 12:46:20 GMT
- Title: DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF
- Authors: Doriand Petit, Steve Bourgeois, Vincent Gay-Bellile, Florian Chabot, Loïc Barthe,
- Abstract summary: DiSCO-3D aims to provide a 3D semantic segmentation that adapts to both the scene and user queries.<n>We build DiSCO-3D on Neural Fields representations, combining unsupervised segmentation with weak open-vocabulary guidance.<n>Our evaluations demonstrate that DiSCO-3D achieves effective performance in Open-Vocabulary Sub-concepts Discovery.
- Score: 0.5409700620900997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D semantic segmentation provides high-level scene understanding for applications in robotics, autonomous systems, \textit{etc}. Traditional methods adapt exclusively to either task-specific goals (open-vocabulary segmentation) or scene content (unsupervised semantic segmentation). We propose DiSCO-3D, the first method addressing the broader problem of 3D Open-Vocabulary Sub-concepts Discovery, which aims to provide a 3D semantic segmentation that adapts to both the scene and user queries. We build DiSCO-3D on Neural Fields representations, combining unsupervised segmentation with weak open-vocabulary guidance. Our evaluations demonstrate that DiSCO-3D achieves effective performance in Open-Vocabulary Sub-concepts Discovery and exhibits state-of-the-art results in the edge cases of both open-vocabulary and unsupervised segmentation.
Related papers
- PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum [20.206273757144547]
PGOV3D is a novel framework that introduces a Partial-to-Global curriculum for improving open-vocabulary 3D semantic segmentation.<n>We pre-train the model on partial scenes that provide dense semantic information but relatively simple geometry.<n>In the second stage, we fine-tune the model on complete scene-level point clouds, which are sparser and structurally more complex.
arXiv Detail & Related papers (2025-06-30T08:13:07Z) - Segment Any 3D-Part in a Scene from a Sentence [50.46950922754459]
This paper aims to achieve the segmentation of any 3D part in a scene based on natural language descriptions.<n>We introduce the 3D-PU dataset, the first large-scale 3D dataset with dense part annotations.<n>On the methodological side, we propose OpenPart3D, a 3D-input-only framework to tackle the challenges of part-level segmentation.
arXiv Detail & Related papers (2025-06-24T05:51:22Z) - OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting [52.40697058096931]
3D Gaussian Splatting (3DGS) has emerged as a powerful representation for neural scene reconstruction.<n>We introduce an approach for open-vocabulary 3D instance segmentation without requiring manual labeling, termed OpenSplat3D.<n>We show results on LERF-mask and LERF-OVS as well as the full ScanNet++ validation set, demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2025-06-09T12:37:15Z) - Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding [31.40722103849691]
MPEC is a novel learning method for open-vocabulary 3D semantic segmentation.<n>It uses both 3D entity-language alignment and point-entity consistency across different point cloud views.<n>Our method achieves state-of-the-art results on ScanNet for open-vocabulary 3D semantic segmentation.
arXiv Detail & Related papers (2025-04-28T05:43:14Z) - Search3D: Hierarchical Open-Vocabulary 3D Segmentation [78.47704793095669]
We introduce Search3D, an approach to construct hierarchical open-vocabulary 3D scene representations.<n>Unlike prior methods, Search3D shifts towards a more flexible open-vocabulary 3D search paradigm.<n>For systematic evaluation, we contribute a scene-scale open-vocabulary 3D part segmentation benchmark based on MultiScan.
arXiv Detail & Related papers (2024-09-27T03:44:07Z) - CUS3D :CLIP-based Unsupervised 3D Segmentation via Object-level Denoise [9.12768731317489]
We propose a novel distillation learning framework named CUS3D.
An object-level denosing projection module is designed to screen out the noise'' and ensure more accurate 3D feature.
Based on the obtained features, a multimodal distillation learning module is designed to align the 3D feature with CLIP semantic feature space.
arXiv Detail & Related papers (2024-09-21T02:17:35Z) - GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields [50.68719394443926]
Generalizable Open-Vocabulary Neural Semantic Fields (GOV-NeSF) is a novel approach offering a generalizable implicit representation of 3D scenes with open-vocabulary semantics.
GOV-NeSF exhibits state-of-the-art performance in both 2D and 3D open-vocabulary semantic segmentation.
arXiv Detail & Related papers (2024-04-01T05:19:50Z) - Panoptic Vision-Language Feature Fields [27.209602602110916]
We propose the first algorithm for open-vocabulary panoptic segmentation in 3D scenes.
Our algorithm learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model.
Our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset.
arXiv Detail & Related papers (2023-09-11T13:41:27Z) - Lowis3D: Language-Driven Open-World Instance-Level 3D Scene
Understanding [57.47315482494805]
Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset.
This task is challenging because the model needs to both localize novel 3D objects and infer their semantic categories.
We propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for 3D scenes.
arXiv Detail & Related papers (2023-08-01T07:50:14Z) - Weakly Supervised 3D Open-vocabulary Segmentation [104.07740741126119]
We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner.
We distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF)
A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process.
arXiv Detail & Related papers (2023-05-23T14:16:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.