Cognitive Planning for Object Goal Navigation using Generative AI Models
- URL: http://arxiv.org/abs/2404.00318v2
- Date: Tue, 05 Nov 2024 17:51:36 GMT
- Title: Cognitive Planning for Object Goal Navigation using Generative AI Models
- Authors: Arjun P S, Andrew Melnik, Gora Chand Nandi,
- Abstract summary: We present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies.
Our approach enables a robot to navigate unfamiliar environments by leveraging Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
- Score: 0.979851640406258
- License:
- Abstract: Recent advancements in Generative AI, particularly in Large Language Models (LLMs) and Large Vision-Language Models (LVLMs), offer new possibilities for integrating cognitive planning into robotic systems. In this work, we present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies. Our approach enables a robot to navigate unfamiliar environments by leveraging LLMs and LVLMs to understand the semantic structure of the scene. To address the challenge of representing complex environments without overwhelming the system, we propose a 3D modular scene representation, enriched with semantic descriptions. This representation is dynamically pruned using an LLM-based mechanism, which filters irrelevant information and focuses on task-specific data. By combining these elements, our system generates high-level sub-goals that guide the exploration of the robot toward the target object. We validate our approach in simulated environments, demonstrating its ability to enhance object search efficiency while maintaining scalability in complex settings.
Related papers
- Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM [12.934788858420752]
Go-SLAM is a novel framework that utilizes 3D Gaussian Splatting SLAM to reconstruct dynamic environments.
Our system facilitates open-vocabulary querying, allowing users to locate objects using natural language descriptions.
arXiv Detail & Related papers (2024-09-25T13:56:08Z) - MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting [97.52388851329667]
We introduce Marking Open-world Keypoint Affordances (MOKA) to solve robotic manipulation tasks specified by free-form language instructions.
Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world.
We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.
arXiv Detail & Related papers (2024-03-05T18:08:45Z) - RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation [21.387160107315797]
We introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG)
The ACSG accounts for both low-level information (geometry and semantics) and high-level information (action-conditioned relationships between different entities) in the scene.
We present the Robotic Exploration (RoboEXP) system, which incorporates the Large Multimodal Model (LMM) and an explicit memory design to enhance our system's capabilities.
arXiv Detail & Related papers (2024-02-23T18:27:17Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Look Before You Leap: Unveiling the Power of GPT-4V in Robotic
Vision-Language Planning [32.045840007623276]
We introduce Robotic Vision-Language Planning (ViLa), a novel approach for long-horizon robotic planning.
ViLa directly integrates perceptual data into its reasoning and planning process.
Our evaluation, conducted in both real-robot and simulated environments, demonstrates ViLa's superiority over existing LLM-based planners.
arXiv Detail & Related papers (2023-11-29T17:46:25Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration
for Zero-Shot Object Navigation [58.3480730643517]
We present LGX, a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON)
Our approach makes use of Large Language Models (LLMs) for this task.
We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27% over the current baseline.
arXiv Detail & Related papers (2023-03-06T20:19:19Z) - Distributed Reinforcement Learning of Targeted Grasping with Active
Vision for Mobile Manipulators [4.317864702902075]
We present the first RL-based system for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects.
We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.
arXiv Detail & Related papers (2020-07-16T02:47:48Z) - Object Goal Navigation using Goal-Oriented Semantic Exploration [98.14078233526476]
This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments.
We propose a modular system called, Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently.
arXiv Detail & Related papers (2020-07-01T17:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.