Stubborn: A Strong Baseline for Indoor Object Navigation
- URL: http://arxiv.org/abs/2203.07359v1
- Date: Mon, 14 Mar 2022 17:55:00 GMT
- Title: Stubborn: A Strong Baseline for Indoor Object Navigation
- Authors: Haokuan Luo, Albert Yue, Zhang-Wei Hong, Pulkit Agrawal
- Abstract summary: We present a strong baseline that surpasses the performance of previously published methods on the Habitat Challenge task.
Our method is motivated from primary failure modes of prior state-of-the-art: poor exploration, inaccurate object identification, and agent getting trapped due to imprecise map construction.
- Score: 11.947727956369874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a strong baseline that surpasses the performance of previously
published methods on the Habitat Challenge task of navigating to a target
object in indoor environments. Our method is motivated from primary failure
modes of prior state-of-the-art: poor exploration, inaccurate object
identification, and agent getting trapped due to imprecise map construction. We
make three contributions to mitigate these issues: (i) First, we show that
existing map-based methods fail to effectively use semantic clues for
exploration. We present a semantic-agnostic exploration strategy (called
Stubborn) without any learning that surprisingly outperforms prior work. (ii)
We propose a strategy for integrating temporal information to improve object
identification. (iii) Lastly, due to inaccurate depth observation the agent
often gets trapped in small regions. We develop a multi-scale collision map for
obstacle identification that mitigates this issue.
Related papers
- DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation [88.84058353659107]
Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment.
We propose a new modular navigation framework named Instance-aware Exploration-Verification-Exploitation (IEVE) for instance-level image goal navigation.
Our method surpasses previous state-of-the-art work, with a classical segmentation model (0.684 vs. 0.561 success) or a robust model (0.702 vs. 0.561 success)
arXiv Detail & Related papers (2024-02-25T07:59:10Z) - Towards End-to-End Unsupervised Saliency Detection with Self-Supervised
Top-Down Context [25.85453873366275]
We propose a self-supervised end-to-end salient object detection framework via top-down context.
We exploit the self-localization from the deepest feature to construct the location maps which are then leveraged to learn the most instructive segmentation guidance.
Our method achieves leading performance among the recent end-to-end methods and most of the multi-stage solutions.
arXiv Detail & Related papers (2023-10-14T08:43:22Z) - Implicit Obstacle Map-driven Indoor Navigation Model for Robust Obstacle
Avoidance [16.57243997206754]
We propose a novel implicit obstacle map-driven indoor navigation framework for robust obstacle avoidance.
A non-local target memory aggregation module is designed to leverage a non-local network to model the intrinsic relationship between the target semantic and the target orientation clues.
arXiv Detail & Related papers (2023-08-24T15:10:28Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - Perspective Aware Road Obstacle Detection [104.57322421897769]
We show that road obstacle detection techniques ignore the fact that, in practice, the apparent size of the obstacles decreases as their distance to the vehicle increases.
We leverage this by computing a scale map encoding the apparent size of a hypothetical object at every image location.
We then leverage this perspective map to generate training data by injecting onto the road synthetic objects whose size corresponds to the perspective foreshortening.
arXiv Detail & Related papers (2022-10-04T17:48:42Z) - Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration [47.01485765231528]
Active visual exploration aims to assist an agent with a limited field of view to understand its environment based on partial observations.
We propose the Glimpse-Attend-and-Explore model which employs self-attention to guide the visual exploration instead of task-specific uncertainty maps.
Our model provides encouraging results while being less dependent on dataset bias in driving the exploration.
arXiv Detail & Related papers (2021-08-26T11:41:03Z) - Unsupervised Object Detection with LiDAR Clues [70.73881791310495]
We present the first practical method for unsupervised object detection with the aid of LiDAR clues.
In our approach, candidate object segments based on 3D point clouds are firstly generated.
Then, an iterative segment labeling process is conducted to assign segment labels and to train a segment labeling network.
The labeling process is carefully designed so as to mitigate the issue of long-tailed and open-ended distribution.
arXiv Detail & Related papers (2020-11-25T18:59:54Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.