Related papers: OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

URL: http://arxiv.org/abs/2402.15321v2
Date: Sun, 17 Mar 2024 08:41:49 GMT
Title: OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding
Authors: Francis Engelmann, Ayca Takmaz, Jonas Schult, Elisabetta Fedele, Johanna Wald, Songyou Peng, Xi Wang, Or Litany, Siyu Tang, Federico Tombari, Marc Pollefeys, Leonidas Guibas, Hongbo Tian, Chunjie Wang, Xiaosheng Yan, Bingwen Wang, Xuanyang Zhang, Xiao Liu, Phuc Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham, Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao, Lei Zhu, Joan Lasenby,
Abstract summary: This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023.
Score: 96.69806736025248
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and mapping. We provide an overview of the challenge hosted at the workshop, present the challenge dataset, the evaluation methodology, and brief descriptions of the winning methods. For additional details, please see https://opensun3d.github.io/index_iccv23.html.

Related papers

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild [164.8093566483583]
This report provides a comprehensive overview of the 4th Pixel-level Video Understanding in the Wild (PVUW) Challenge, held in conjunction with CVPR 2025. The challenge features two tracks: MOSE, which focuses on complex scene video object segmentation, and MeViS, which targets motion-guided, language-based video segmentation.
arXiv Detail & Related papers (2025-04-15T16:02:47Z)
Functionality understanding and segmentation in 3D scenes [6.1744362771344]
We introduce Fun3DU, the first approach designed for functionality understanding in 3D scenes. Fun3DU uses a language model to parse the task description through Chain-of-Thought reasoning. We evaluate Fun3DU on SceneFun3D, the most recent and only dataset to benchmark this task.
arXiv Detail & Related papers (2024-11-25T11:57:48Z)
Search3D: Hierarchical Open-Vocabulary 3D Segmentation [78.47704793095669]
We introduce Search3D, an approach to construct hierarchical open-vocabulary 3D scene representations. Unlike prior methods, Search3D shifts towards a more flexible open-vocabulary 3D search paradigm. For systematic evaluation, we contribute a scene-scale open-vocabulary 3D part segmentation benchmark based on MultiScan.
arXiv Detail & Related papers (2024-09-27T03:44:07Z)
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding [43.69535335079362]
Open-vocabulary 3D scene understanding (OV-3D) aims to localize and classify novel objects beyond the closed object classes. Existing approaches and benchmarks primarily focus on the open vocabulary problem within the context of object classes. We introduce a more challenging task called Generalized Open-Vocabulary 3D Scene Understanding (GOV-3D) to explore the open vocabulary problem beyond object classes.
arXiv Detail & Related papers (2024-08-20T17:31:48Z)
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results [142.5704093410454]
The V3Det Challenge 2024 aims to push the boundaries of object detection research. The challenge consists of two tracks: Vast Vocabulary Object Detection and Open Vocabulary Object Detection. We aim to inspire future research directions in vast vocabulary and open-vocabulary object detection.
arXiv Detail & Related papers (2024-06-17T16:58:51Z)
Open-Vocabulary SAM3D: Towards Training-free Open-Vocabulary 3D Scene Understanding [41.96929575241655]
We introduce OV-SAM3D, a training-free method for understanding open-vocabulary 3D scenes. This framework is designed to perform understanding tasks for any 3D scene without requiring prior knowledge of the scene. Empirical evaluations on the ScanNet200 and nuScenes datasets demonstrate that our approach surpasses existing open-vocabulary methods in unknown open-world environments.
arXiv Detail & Related papers (2024-05-24T14:07:57Z)
OpenMask3D: Open-Vocabulary 3D Instance Segmentation [84.58747201179654]
OpenMask3D is a zero-shot approach for open-vocabulary 3D instance segmentation. Our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings.
arXiv Detail & Related papers (2023-06-23T17:36:44Z)
Weakly Supervised 3D Open-vocabulary Segmentation [104.07740741126119]
We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner. We distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF) A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process.
arXiv Detail & Related papers (2023-05-23T14:16:49Z)
A Simple Framework for Open-Vocabulary Segmentation and Detection [85.21641508535679]
We present OpenSeeD, a simple Open-vocabulary and Detection framework that jointly learns from different segmentation and detection datasets. We first introduce a pre-trained text encoder to encode all the visual concepts in two tasks and learn a common semantic space for them. After pre-training, our model exhibits competitive or stronger zero-shot transferability for both segmentation and detection.
arXiv Detail & Related papers (2023-03-14T17:58:34Z)
OpenScene: 3D Scene Understanding with Open Vocabularies [73.1411930820683]
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries.
arXiv Detail & Related papers (2022-11-28T18:58:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.