Constrained Robotic Navigation on Preferred Terrains Using LLMs and Speech Instruction: Exploiting the Power of Adverbs
- URL: http://arxiv.org/abs/2404.02294v1
- Date: Tue, 2 Apr 2024 20:46:13 GMT
- Title: Constrained Robotic Navigation on Preferred Terrains Using LLMs and Speech Instruction: Exploiting the Power of Adverbs
- Authors: Faraz Lotfi, Farnoosh Faraji, Nikhil Kakodkar, Travis Manderson, David Meger, Gregory Dudek,
- Abstract summary: This paper explores leveraging large language models for map-free off-road navigation using generative AI.
We propose a method where a robot receives verbal instructions, converted to text through Whisper, and a large language model extracts landmarks, preferred terrains, and crucial adverbs translated into speed settings for constrained navigation.
- Score: 29.507826791509384
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper explores leveraging large language models for map-free off-road navigation using generative AI, reducing the need for traditional data collection and annotation. We propose a method where a robot receives verbal instructions, converted to text through Whisper, and a large language model (LLM) model extracts landmarks, preferred terrains, and crucial adverbs translated into speed settings for constrained navigation. A language-driven semantic segmentation model generates text-based masks for identifying landmarks and terrain types in images. By translating 2D image points to the vehicle's motion plane using camera parameters, an MPC controller can guides the vehicle towards the desired terrain. This approach enhances adaptation to diverse environments and facilitates the use of high-level instructions for navigating complex and challenging terrains.
Related papers
- Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching [60.645802236700035]
Navigating drones through natural language commands remains challenging due to the dearth of accessible multi-modal datasets.
We introduce GeoText-1652, a new natural language-guided geo-localization benchmark.
This dataset is systematically constructed through an interactive human-computer process.
arXiv Detail & Related papers (2023-11-21T17:52:30Z) - Towards Automatic Satellite Images Captions Generation Using Large
Language Models [0.5439020425819]
We propose Automatic Remote Sensing Image Captioning (ARSIC) to automatically collect captions for remote sensing images.
We also present a benchmark model that adapts the pre-trained generative image2text model (GIT) to generate high-quality captions for remote-sensing images.
arXiv Detail & Related papers (2023-10-17T16:45:47Z) - Interactive Navigation in Environments with Traversable Obstacles Using
Large Language and Vision-Language Models [14.871309526022516]
This paper proposes an interactive navigation framework by using large language and vision-language models.
We create an action-aware costmap to perform effective path planning without fine-tuning.
All experimental results demonstrated the proposed framework's effectiveness and adaptability to diverse environments.
arXiv Detail & Related papers (2023-10-13T05:59:03Z) - LangNav: Language as a Perceptual Representation for Navigation [63.90602960822604]
We explore the use of language as a perceptual representation for vision-and-language navigation (VLN)
Our approach uses off-the-shelf vision systems for image captioning and object detection to convert an agent's egocentric panoramic view at each time step into natural language descriptions.
arXiv Detail & Related papers (2023-10-11T20:52:30Z) - A New Path: Scaling Vision-and-Language Navigation with Synthetic
Instructions and Imitation Learning [70.14372215250535]
Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments.
Given the scarcity of human instruction data and limited diversity in the training environments, these agents still struggle with complex language grounding and spatial language understanding.
We take 500+ indoor environments captured in densely-sampled 360 degree panoramas, construct navigation trajectories through these panoramas, and generate a visually-grounded instruction for each trajectory.
The resulting dataset of 4.2M instruction-trajectory pairs is two orders of magnitude larger than existing human-annotated datasets.
arXiv Detail & Related papers (2022-10-06T17:59:08Z) - LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories.
We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z) - Polyline Based Generative Navigable Space Segmentation for Autonomous
Visual Navigation [57.3062528453841]
We propose a representation-learning-based framework to enable robots to learn the navigable space segmentation in an unsupervised manner.
We show that the proposed PSV-Nets can learn the visual navigable space with high accuracy, even without any single label.
arXiv Detail & Related papers (2021-10-29T19:50:48Z) - Semantic Image Alignment for Vehicle Localization [111.59616433224662]
We present a novel approach to vehicle localization in dense semantic maps using semantic segmentation from a monocular camera.
In contrast to existing visual localization approaches, the system does not require additional keypoint features, handcrafted localization landmark extractors or expensive LiDAR sensors.
arXiv Detail & Related papers (2021-10-08T14:40:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.