DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments
- URL: http://arxiv.org/abs/2402.19007v2
- Date: Mon, 8 Jul 2024 07:58:13 GMT
- Title: DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments
- Authors: Ji Ma, Hongming Dai, Yao Mu, Pengying Wu, Hao Wang, Xiaowei Chi, Yang Fei, Shanghang Zhang, Chang Liu,
- Abstract summary: Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments.
Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object diversity, and scene texts.
We propose a dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE)
DOZE comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios.
- Score: 28.23284296418962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancies from real-world situations. To address these issues, we propose a Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) that comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios. Specifically, DOZE scenes feature multiple moving humanoid obstacles, a wide array of open-vocabulary objects, diverse distinct-attribute objects, and valuable textual hints. Besides, different from existing datasets that only provide collision checking between the agent and static obstacles, we enhance DOZE by integrating capabilities for detecting collisions between the agent and moving obstacles. This novel functionality enables the evaluation of the agents' collision avoidance abilities in dynamic environments. We test four representative ZSON methods on DOZE, revealing substantial room for improvement in existing approaches concerning navigation efficiency, safety, and object recognition accuracy. Our dataset can be found at https://DOZE-Dataset.github.io/.
Related papers
- Moving Object Segmentation in Point Cloud Data using Hidden Markov Models [0.0]
We propose a robust learning-free approach to segment moving objects in point cloud data.
The proposed approach is tested on benchmark datasets and consistently performs better than or as well as state-of-the-art methods.
arXiv Detail & Related papers (2024-10-24T10:56:02Z) - Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments [44.6372390798904]
We propose a new task denominated Personalized Instance-based Navigation (PIN), in which an embodied agent is tasked with locating and reaching a specific personal object.
In each episode, the target object is presented to the agent using two modalities: a set of visual reference images on a neutral background and manually annotated textual descriptions.
arXiv Detail & Related papers (2024-10-23T18:01:09Z) - Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS [68.47681139026666]
Video object segmentation (VOS) is a crucial task in computer vision.
Current VOS methods struggle with complex scenes and prolonged object motions.
This report introduces a discriminative spatial-temporal VOS model.
arXiv Detail & Related papers (2024-08-29T10:47:17Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Object Instance Identification in Dynamic Environments [19.009931116468294]
We study the problem of identifying object instances in a dynamic environment where people interact with the objects.
We build a benchmark of more than 1,500 instances built on the EPIC-KITCHENS dataset.
Experimental results suggest that (i) robustness against instance-specific appearance change (ii) integration of low-level (e.g., color, texture) and high-level (e.g., object category) features are required for further improvement.
arXiv Detail & Related papers (2022-06-10T18:38:10Z) - Addressing Multiple Salient Object Detection via Dual-Space Long-Range
Dependencies [3.8824028205733017]
Salient object detection plays an important role in many downstream tasks.
We propose a network architecture incorporating non-local feature information in both the spatial and channel spaces.
We show that our approach accurately locates multiple salient regions even in complex scenarios.
arXiv Detail & Related papers (2021-11-04T23:16:53Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Visual Object Recognition in Indoor Environments Using Topologically
Persistent Features [2.2344764434954256]
Object recognition in unseen indoor environments remains a challenging problem for visual perception of mobile robots.
We propose the use of topologically persistent features, which rely on the objects' shape information, to address this challenge.
We implement the proposed method on a real-world robot to demonstrate its usefulness.
arXiv Detail & Related papers (2020-10-07T06:04:17Z) - SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars.
We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.