Related papers: ATLASv2: LLM-Guided Adaptive Landmark Acquisition and Navigation on the Edge

ATLASv2: LLM-Guided Adaptive Landmark Acquisition and Navigation on the Edge

URL: http://arxiv.org/abs/2504.10784v1
Date: Tue, 15 Apr 2025 00:55:57 GMT
Title: ATLASv2: LLM-Guided Adaptive Landmark Acquisition and Navigation on the Edge
Authors: Mikolaj Walczak, Uttej Kallakuri, Tinoosh Mohsenin,
Abstract summary: ATLASv2 is a novel system that integrates a fine-tuned TinyLLM, real-time object detection, and efficient path planning.<n>We evaluate ATLASv2 in real-world environments, including a handcrafted home and office setting constructed with diverse objects and landmarks.<n>Results show that ATLASv2 effectively interprets natural language instructions, decomposes them into low-level actions, and executes tasks with high success rates.
Score: 0.5243460995467893
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous systems deployed on edge devices face significant challenges, including resource constraints, real-time processing demands, and adapting to dynamic environments. This work introduces ATLASv2, a novel system that integrates a fine-tuned TinyLLM, real-time object detection, and efficient path planning to enable hierarchical, multi-task navigation and manipulation all on the edge device, Jetson Nano. ATLASv2 dynamically expands its navigable landmarks by detecting and localizing objects in the environment which are saved to its internal knowledge base to be used for future task execution. We evaluate ATLASv2 in real-world environments, including a handcrafted home and office setting constructed with diverse objects and landmarks. Results show that ATLASv2 effectively interprets natural language instructions, decomposes them into low-level actions, and executes tasks with high success rates. By leveraging generative AI in a fully on-board framework, ATLASv2 achieves optimized resource utilization with minimal prompting latency and power consumption, bridging the gap between simulated environments and real-world applications.

Related papers

E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models [16.50787220881633]
Large language models (LLMs) have shown significant potential in guiding embodied agents to execute language instructions.<n>Existing methods are primarily designed for static environments and do not leverage agent's own experiences to refine its initial plans.<n>This study introduces the Experience-and-Emotion Map (E2Map), which not only integrates LLM knowledge but also the agent's real-world experiences.
arXiv Detail & Related papers (2024-09-16T06:35:18Z)
OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model [12.096387853748938]
Air-ground robots (AGRs) are widely used in surveillance and disaster response. Current AGR navigation systems perform well in static environments. However, these systems face challenges in dynamic, severe occlusion scenes. We propose OccMamba with an Efficient AGR-Planner to address these problems.
arXiv Detail & Related papers (2024-08-20T07:50:29Z)
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control [53.80518003412016]
Building a general-purpose intelligent home-assistant agent skilled in diverse tasks by human commands is a long-term blueprint of embodied AI research. We study primitive mobile manipulations for embodied agents, i.e. how to navigate and interact based on an instructed verb-noun pair. We propose DISCO, which features non-trivial advancements in contextualized scene modeling and efficient controls.
arXiv Detail & Related papers (2024-07-20T05:39:28Z)
Cognitive Planning for Object Goal Navigation using Generative AI Models [0.979851640406258]
We present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies. Our approach enables a robot to navigate unfamiliar environments by leveraging Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
arXiv Detail & Related papers (2024-03-30T10:54:59Z)
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation [73.78984332354636]
CorNav is a novel zero-shot framework for vision-and-language navigation. It incorporates environmental feedback for refining future plans and adjusting its actions. It consistently outperforms all baselines in a zero-shot multi-task setting.
arXiv Detail & Related papers (2023-06-17T11:44:04Z)
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z)
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents [31.499374840833124]
We bring a subset of BEHAVIOR activities into Habitat 2.0 to benefit from its fast simulation speed. Inspired by the catalyzing effect that benchmarks have played in the AI fields, the community is looking for new benchmarks for embodied AI.
arXiv Detail & Related papers (2022-06-13T21:37:31Z)
Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration [83.96729205383501]
We introduce prompt-based learning to achieve fast adaptation for language embeddings. Our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE.
arXiv Detail & Related papers (2022-03-08T11:01:24Z)
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation [87.03299519917019]
We propose a dual-scale graph transformer (DUET) for joint long-term action planning and fine-grained cross-modal understanding. We build a topological map on-the-fly to enable efficient exploration in global action space. The proposed approach, DUET, significantly outperforms state-of-the-art methods on goal-oriented vision-and-language navigation benchmarks.
arXiv Detail & Related papers (2022-02-23T19:06:53Z)
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation [88.69873520186017]
We introduce a multitask navigation model that can be seamlessly trained on Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks. Experiments show that environment-agnostic multitask learning significantly reduces the performance gap between seen and unseen environments.
arXiv Detail & Related papers (2020-03-01T09:06:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.