Related papers: AD-H: Autonomous Driving with Hierarchical Agents

AD-H: Autonomous Driving with Hierarchical Agents

URL: http://arxiv.org/abs/2406.03474v1
Date: Wed, 5 Jun 2024 17:25:46 GMT
Title: AD-H: Autonomous Driving with Hierarchical Agents
Authors: Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu,
Abstract summary: We propose to connect high-level instructions and low-level control signals with mid-level language-driven commands. We implement this idea through a hierarchical multi-agent driving system named AD-H.
Score: 64.49185157446297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers. As a result, the generalizability of these methods is highly restricted by autonomous driving datasets used during fine-tuning. To tackle this challenge, we propose to connect high-level instructions and low-level control signals with mid-level language-driven commands, which are more fine-grained than high-level instructions but more universal and explainable than control signals, and thus can effectively bridge the gap in between. We implement this idea through a hierarchical multi-agent driving system named AD-H, including a MLLM planner for high-level reasoning and a lightweight controller for low-level execution. The hierarchical design liberates the MLLM from low-level control signal decoding and therefore fully releases their emergent capability in high-level perception, reasoning, and planning. We build a new dataset with action hierarchy annotations. Comprehensive closed-loop evaluations demonstrate several key advantages of our proposed AD-H system. First, AD-H can notably outperform state-of-the-art methods in achieving exceptional driving performance, even exhibiting self-correction capabilities during vehicle operation, a scenario not encountered in the training dataset. Second, AD-H demonstrates superior generalization under long-horizon instructions and novel environmental conditions, significantly surpassing current state-of-the-art methods. We will make our data and code publicly accessible at https://github.com/zhangzaibin/AD-H

Related papers

Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving [4.750705843012836]
We propose a multi-timescale hierarchical reinforcement learning approach for autonomous driving.<n>High- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands.<n>Our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.
arXiv Detail & Related papers (2025-06-30T12:17:42Z)
Dynamic Path Navigation for Motion Agents with LLM Reasoning [69.5875073447454]
Large Language Models (LLMs) have demonstrated strong generalizable reasoning and planning capabilities. We explore the zero-shot navigation and path generation capabilities of LLMs by constructing a dataset and proposing an evaluation protocol. We demonstrate that, when tasks are well-structured in this manner, modern LLMs exhibit substantial planning proficiency in avoiding obstacles while autonomously refining navigation with the generated motion to reach the target.
arXiv Detail & Related papers (2025-03-10T13:39:09Z)
Performant LLM Agentic Framework for Conversational AI [1.6114012813668932]
We introduce the Performant Agentic Framework (PAF), a novel system that assists Large Language Models (LLMs) in selecting appropriate nodes and executing actions in order when traversing complex graphs. PAF combines LLM-based reasoning with a mathematically grounded vector scoring mechanism, achieving both higher accuracy and reduced latency. Experiments demonstrate that PAF significantly outperforms baseline methods, paving the way for scalable, real-time Conversational AI systems in complex business environments.
arXiv Detail & Related papers (2025-03-09T02:58:34Z)
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z)
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z)
DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving [1.8434042562191815]
We present a novel autonomous driving framework, DualAD, designed to imitate human reasoning during driving. DualAD comprises two layers: a rule-based motion planner at the bottom layer that handles routine driving tasks requiring minimal reasoning, and an upper layer featuring a rule-based text encoder.
arXiv Detail & Related papers (2024-09-26T16:58:04Z)
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators. We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system. This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z)
Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems. LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning. We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z)
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios. Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving [6.728693243652425]
Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations.
arXiv Detail & Related papers (2023-10-03T11:05:14Z)
Ground Manipulator Primitive Tasks to Executable Actions using Large Language Models [13.827349677538352]
We propose a novel approach to ground the manipulator primitive tasks to robot low-level actions using large language models (LLMs) In this way, we enable LLMs to generate position/force set-points for hybrid control.
arXiv Detail & Related papers (2023-08-13T16:52:36Z)
Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data. Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably. We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z)
From proprioception to long-horizon planning in novel environments: A hierarchical RL model [4.44317046648898]
In this work, we introduce a simple, three-level hierarchical architecture that reflects different types of reasoning. We apply our method to a series of navigation tasks in the Mujoco Ant environment.
arXiv Detail & Related papers (2020-06-11T17:19:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.