Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation using
Large Language Models
- URL: http://arxiv.org/abs/2310.07937v2
- Date: Mon, 25 Dec 2023 07:57:13 GMT
- Title: Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation using
Large Language Models
- Authors: Bangguo Yu, Hamidreza Kasaei, Ming Cao
- Abstract summary: Co-NavGPT is an innovative framework that integrates Large Language Models as a global planner for multi-robot cooperative visual target navigation.
It encodes the explored environment data into prompts, enhancing LLMs' scene comprehension.
It then assigns exploration frontiers to each robot for efficient target search.
- Score: 10.312968200748118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In advanced human-robot interaction tasks, visual target navigation is
crucial for autonomous robots navigating unknown environments. While numerous
approaches have been developed in the past, most are designed for single-robot
operations, which often suffer from reduced efficiency and robustness due to
environmental complexities. Furthermore, learning policies for multi-robot
collaboration are resource-intensive. To address these challenges, we propose
Co-NavGPT, an innovative framework that integrates Large Language Models (LLMs)
as a global planner for multi-robot cooperative visual target navigation.
Co-NavGPT encodes the explored environment data into prompts, enhancing LLMs'
scene comprehension. It then assigns exploration frontiers to each robot for
efficient target search. Experimental results on Habitat-Matterport 3D (HM3D)
demonstrate that Co-NavGPT surpasses existing models in success rates and
efficiency without any learning process, demonstrating the vast potential of
LLMs in multi-robot collaboration domains. The supplementary video, prompts,
and code can be accessed via the following link:
https://sites.google.com/view/co-navgpt
Related papers
- General-Purpose Robotic Navigation via LVLM-Orchestrated Perception, Reasoning, and Acting [9.157222032441531]
Agentic Robotic Navigation Architecture (ARNA) is a general-purpose navigation framework that equips an LVLM-based agent with a library of perception, reasoning, and navigation tools.<n>At runtime, the agent autonomously defines and executes task-specific navigation that iteratively query the robotic modules, reason over multimodal inputs, and select appropriate navigation actions.<n>ARNA achieves state-of-the-art performance, demonstrating effective exploration, navigation, and embodied question answering without relying on handcrafted plans, fixed input representations, or pre-existing maps.
arXiv Detail & Related papers (2025-06-20T20:06:14Z) - NavBench: A Unified Robotics Benchmark for Reinforcement Learning-Based Autonomous Navigation [16.554282855005766]
We present NavBench, a benchmark for training and evaluating Reinforcement Learning-based navigation policies.<n>Our framework standardizes task definitions, enabling different robots to tackle various navigation challenges.<n>By ensuring consistency between simulation and real-world deployment, NavBench simplifies the development of RL-based navigation strategies.
arXiv Detail & Related papers (2025-05-20T15:48:23Z) - SayCoNav: Utilizing Large Language Models for Adaptive Collaboration in Decentralized Multi-Robot Navigation [10.877873071364148]
We present SayCoNav, a new approach that leverages large language models (LLMs) for automatically generating this collaboration strategy among a team of robots.<n>We evaluate SayCoNav on Multi-Object Navigation (MultiON) tasks, that require the team of the robots to utilize their complementary strengths to efficiently search multiple different objects in unknown environments.
arXiv Detail & Related papers (2025-05-19T20:58:06Z) - Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities [65.98704516122228]
The integration of foundation models (FMs) into robotics has enabled robots to understand natural language and reason about the semantics in their environments.<n>This paper addresses the deployment of FM-enabled robots in the field, where missions often require a robot to operate in large-scale and unstructured environments.<n>We present the first demonstration of large-scale LLM-enabled robot planning in unstructured environments with several kilometers of missions.
arXiv Detail & Related papers (2025-05-14T15:28:43Z) - VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning [11.140494493881075]
We present a novel vision-language navigation (VL-Nav) system that integrates efficient spatial reasoning on low-power robots.<n>Unlike prior methods that rely on a single image-level feature similarity to guide a robot, our method integrates pixel-wise vision-language features with curiosity-driven exploration.<n>VL-Nav achieves an overall success rate of 86.3%, outperforming previous methods by 44.15%.
arXiv Detail & Related papers (2025-02-02T21:44:15Z) - LPAC: Learnable Perception-Action-Communication Loops with Applications
to Coverage Control [80.86089324742024]
We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem.
CNN processes localized perception; a graph neural network (GNN) facilitates robot communications.
Evaluations show that the LPAC models outperform standard decentralized and centralized coverage control algorithms.
arXiv Detail & Related papers (2024-01-10T00:08:00Z) - From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban
Search and Rescue [46.377510400989536]
We present a novel hybrid algorithm for efficient multi-robot exploration in unknown environments with limited communication and no global positioning information.
We redefine the local best and global best positions to suit scenarios without continuous target information.
The presented work holds promise for enhancing multi-robot exploration in scenarios with limited information and communication capabilities.
arXiv Detail & Related papers (2023-11-28T17:05:25Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments [14.179677726976056]
SayNav is a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks.
SayNav achieves state-of-the-art results and even outperforms an oracle based baseline with strong ground-truth assumptions by more than 8% in terms of success rate.
arXiv Detail & Related papers (2023-09-08T02:24:37Z) - Target Search and Navigation in Heterogeneous Robot Systems with Deep
Reinforcement Learning [3.3167319223959373]
We design a heterogeneous robot system consisting of a UAV and a UGV for search and rescue missions in unknown environments.
The system is able to search for targets and navigate to them in a maze-like mine environment with the policies learned through deep reinforcement learning algorithms.
arXiv Detail & Related papers (2023-08-01T07:09:14Z) - Learning Hierarchical Interactive Multi-Object Search for Mobile
Manipulation [10.21450780640562]
We introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects.
These new challenges require combining manipulation and navigation skills in unexplored environments.
We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
arXiv Detail & Related papers (2023-07-12T12:25:33Z) - ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z) - Audio Visual Language Maps for Robot Navigation [30.33041779258644]
We propose Audio-Visual-Language Maps (AVLMaps), a unified 3D spatial map representation for storing cross-modal information from audio, visual, and language cues.
AVLMaps integrate the open-vocabulary capabilities of multimodal foundation models pre-trained on Internet-scale data by fusing their features into a centralized 3D voxel grid.
In the context of navigation, we show that AVLMaps enable robot systems to index goals in the map based on multimodal queries, e.g., textual descriptions, images, or audio snippets of landmarks.
arXiv Detail & Related papers (2023-03-13T23:17:51Z) - GNM: A General Navigation Model to Drive Any Robot [67.40225397212717]
General goal-conditioned model for vision-based navigation can be trained on data obtained from many distinct but structurally similar robots.
We analyze the necessary design decisions for effective data sharing across robots.
We deploy the trained GNM on a range of new robots, including an under quadrotor.
arXiv Detail & Related papers (2022-10-07T07:26:41Z) - LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories.
We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z) - Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of
Demonstrations for Social Navigation [92.66286342108934]
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a'socially compliant' manner in the presence of other intelligent agents such as humans.
Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations.
arXiv Detail & Related papers (2022-03-28T19:09:11Z) - Decentralized Global Connectivity Maintenance for Multi-Robot
Navigation: A Reinforcement Learning Approach [12.649986200029717]
This work investigates how to navigate a multi-robot team in unknown environments while maintaining connectivity.
We propose a reinforcement learning approach to develop a decentralized policy, which is shared among multiple robots.
We validate the effectiveness of the proposed approach by comparing different combinations of connectivity constraints and behavior cloning.
arXiv Detail & Related papers (2021-09-17T13:20:19Z) - Collaborative Visual Navigation [69.20264563368762]
We propose a large-scale 3D dataset, CollaVN, for multi-agent visual navigation (MAVN)
Diverse MAVN variants are explored to make our problem more general.
A memory-augmented communication framework is proposed. Each agent is equipped with a private, external memory to persistently store communication information.
arXiv Detail & Related papers (2021-07-02T15:48:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.