Towards Navigation by Reasoning over Spatial Configurations
- URL: http://arxiv.org/abs/2105.06839v1
- Date: Fri, 14 May 2021 14:04:23 GMT
- Title: Towards Navigation by Reasoning over Spatial Configurations
- Authors: Yue Zhang, Quan Guo, Parisa Kordjamshidi
- Abstract summary: We show the importance of spatial semantics in grounding navigation instructions into visual perceptions.
We propose a neural agent that uses the elements of spatial configurations and investigate their influence on the navigation agent's reasoning ability.
- Score: 20.324906029170457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We deal with the navigation problem where the agent follows natural language
instructions while observing the environment. Focusing on language
understanding, we show the importance of spatial semantics in grounding
navigation instructions into visual perceptions. We propose a neural agent that
uses the elements of spatial configurations and investigate their influence on
the navigation agent's reasoning ability. Moreover, we model the sequential
execution order and align visual objects with spatial configurations in the
instruction. Our neural agent improves strong baselines on the seen
environments and shows competitive performance on the unseen environments.
Additionally, the experimental results demonstrate that explicit modeling of
spatial semantic elements in the instructions can improve the grounding and
spatial reasoning of the model.
Related papers
- Augmented Commonsense Knowledge for Remote Object Grounding [67.30864498454805]
We propose an augmented commonsense knowledge model (ACK) to leverage commonsense information as atemporal knowledge graph for improving agent navigation.
ACK consists of knowledge graph-aware cross-modal and concept aggregation modules to enhance visual representation and visual-textual data alignment.
We add a new pipeline for the commonsense-based decision-making process which leads to more accurate local action prediction.
arXiv Detail & Related papers (2024-06-03T12:12:33Z) - NavHint: Vision and Language Navigation Agent with a Hint Generator [31.322331792911598]
We provide indirect supervision to the navigation agent through a hint generator that provides detailed visual descriptions.
The hint generator assists the navigation agent in developing a global understanding of the visual environment.
We evaluate our method on the R2R and R4R datasets and achieve state-of-the-art on several metrics.
arXiv Detail & Related papers (2024-02-04T16:23:16Z) - Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for
Navigation Instruction Generation [70.76686546473994]
We introduce a novel speaker model textscKefa for navigation instruction generation.
The proposed KEFA speaker achieves state-of-the-art instruction generation performance for both indoor and outdoor scenes.
arXiv Detail & Related papers (2023-07-25T09:39:59Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Structured Exploration Through Instruction Enhancement for Object
Navigation [0.0]
We propose a hierarchical learning-based method for object navigation.
The top-level is capable of high-level planning, and building a memory on a floorplan-level.
We demonstrate the effectiveness of our method on a dynamic domestic environment.
arXiv Detail & Related papers (2022-11-15T19:39:22Z) - LOViS: Learning Orientation and Visual Signals for Vision and Language
Navigation [23.84492755669486]
In this paper, we design a neural agent with explicit Orientation and Vision modules.
Those modules learn to ground spatial information and landmark mentions in the instructions to the visual environment more effectively.
We evaluate our approach on both Room2room (R2R) and Room4room (R4R) datasets and achieve the state of the art results on both benchmarks.
arXiv Detail & Related papers (2022-09-26T14:26:50Z) - Diagnosing Vision-and-Language Navigation: What Really Matters [61.72935815656582]
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments.
Recent studies witness a slow-down in the performance improvements in both indoor and outdoor VLN tasks.
In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation.
arXiv Detail & Related papers (2021-03-30T17:59:07Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - Neural Topological SLAM for Visual Navigation [112.73876869904]
We design topological representations for space that leverage semantics and afford approximate geometric reasoning.
We describe supervised learning-based algorithms that can build, maintain and use such representations under noisy actuation.
arXiv Detail & Related papers (2020-05-25T17:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.