Virtual Guidance as a Mid-level Representation for Navigation
- URL: http://arxiv.org/abs/2303.02731v2
- Date: Sun, 17 Sep 2023 12:47:05 GMT
- Title: Virtual Guidance as a Mid-level Representation for Navigation
- Authors: Hsuan-Kung Yang, Tsung-Chih Chiang, Ting-Ru Liu, Chun-Wei Huang,
Jou-Min Liu, Chun-Yi Lee
- Abstract summary: "Virtual Guidance" is designed to visually represent non-visual instructional signals.
We evaluate our proposed method through experiments in both simulated and real-world settings.
- Score: 8.712750753534532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the context of autonomous navigation, effectively conveying abstract
navigational cues to agents in dynamic environments poses challenges,
particularly when the navigation information is multimodal. To address this
issue, the paper introduces a novel technique termed "Virtual Guidance," which
is designed to visually represent non-visual instructional signals. These
visual cues, rendered as colored paths or spheres, are overlaid onto the
agent's camera view, serving as easily comprehensible navigational
instructions. We evaluate our proposed method through experiments in both
simulated and real-world settings. In the simulated environments, our virtual
guidance outperforms baseline hybrid approaches in several metrics, including
adherence to planned routes and obstacle avoidance. Furthermore, we extend the
concept of virtual guidance to transform text-prompt-based instructions into a
visually intuitive format for real-world experiments. Our results validate the
adaptability of virtual guidance and its efficacy in enabling policy transfer
from simulated scenarios to real-world ones.
Related papers
- Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - IN-Sight: Interactive Navigation through Sight [20.184155117341497]
IN-Sight is a novel approach to self-supervised path planning.
It calculates traversability scores and incorporates them into a semantic map.
To precisely navigate around obstacles, IN-Sight employs a local planner.
arXiv Detail & Related papers (2024-08-01T07:27:54Z) - Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks [93.38375271826202]
We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks.
We first build a simulator by integrating Gaussian splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks.
In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, programming of expert demonstration training data, and the task understanding capabilities of Liquid networks.
arXiv Detail & Related papers (2024-06-21T13:48:37Z) - Robust Navigation with Cross-Modal Fusion and Knowledge Transfer [16.529923581195753]
We consider the problem of improving the generalization of mobile robots.
We propose a cross-modal fusion method and a knowledge transfer framework for better generalization.
By imitating the behavior and representation of the teacher, the student is able to align the features from noisy multi-modal input.
arXiv Detail & Related papers (2023-09-23T05:16:35Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Navigating to Objects in the Real World [76.1517654037993]
We present a large-scale empirical study of semantic visual navigation methods comparing methods from classical, modular, and end-to-end learning approaches.
We find that modular learning works well in the real world, attaining a 90% success rate.
In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.
arXiv Detail & Related papers (2022-12-02T01:10:47Z) - Image-based Navigation in Real-World Environments via Multiple Mid-level
Representations: Fusion Models, Benchmark and Efficient Evaluation [13.207579081178716]
In recent learning-based navigation approaches, the scene understanding and navigation abilities of the agent are achieved simultaneously.
Unfortunately, even if simulators represent an efficient tool to train navigation policies, the resulting models often fail when transferred into the real world.
One possible solution is to provide the navigation model with mid-level visual representations containing important domain-invariant properties of the scene.
arXiv Detail & Related papers (2022-02-02T15:00:44Z) - ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals.
We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.
We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z) - Unsupervised Domain Adaptation for Visual Navigation [115.85181329193092]
We propose an unsupervised domain adaptation method for visual navigation.
Our method translates the images in the target domain to the source domain such that the translation is consistent with the representations learned by the navigation policy.
arXiv Detail & Related papers (2020-10-27T18:22:43Z) - On Embodied Visual Navigation in Real Environments Through Habitat [20.630139085937586]
Visual navigation models based on deep learning can learn effective policies when trained on large amounts of visual observations.
To deal with this limitation, several simulation platforms have been proposed in order to train visual navigation policies on virtual environments efficiently.
We show that our tool can effectively help to train and evaluate navigation policies on real-world observations without running navigation pisodes in the real world.
arXiv Detail & Related papers (2020-10-26T09:19:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.