Related papers: GPT-Driver: Learning to Drive with GPT

GPT-Driver: Learning to Drive with GPT

URL: http://arxiv.org/abs/2310.01415v3
Date: Tue, 5 Dec 2023 05:26:29 GMT
Title: GPT-Driver: Learning to Drive with GPT
Authors: Jiageng Mao, Yuxi Qian, Junjie Ye, Hang Zhao, Yue Wang
Abstract summary: We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. We capitalize on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs) We evaluate our approach on the large-scale nuScenes dataset, and extensive experiments substantiate the effectiveness, generalization ability, and interpretability of our GPT-based motion planner.
Score: 47.14350537515685
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. Motion planning is a core challenge in autonomous driving, aiming to plan a driving trajectory that is safe and comfortable. Existing motion planners predominantly leverage heuristic methods to forecast driving trajectories, yet these approaches demonstrate insufficient generalization capabilities in the face of novel and unseen driving scenarios. In this paper, we propose a novel approach to motion planning that capitalizes on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs). The fundamental insight of our approach is the reformulation of motion planning as a language modeling problem, a perspective not previously explored. Specifically, we represent the planner inputs and outputs as language tokens, and leverage the LLM to generate driving trajectories through a language description of coordinate positions. Furthermore, we propose a novel prompting-reasoning-finetuning strategy to stimulate the numerical reasoning potential of the LLM. With this strategy, the LLM can describe highly precise trajectory coordinates and also its internal decision-making process in natural language. We evaluate our approach on the large-scale nuScenes dataset, and extensive experiments substantiate the effectiveness, generalization ability, and interpretability of our GPT-based motion planner. Code is now available at https://github.com/PointsCoder/GPT-Driver.

Related papers

VLMPlanner: Integrating Visual Language Models with Motion Planning [18.633637485218802]
VLMPlanner is a hybrid framework that combines a learning-based real-time planner with a vision-language model (VLM) capable of reasoning over raw images.<n>We develop the Context-Adaptive Inference Gate mechanism that enables the VLM to mimic human driving behavior.
arXiv Detail & Related papers (2025-07-27T16:15:21Z)
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling [75.83583076519311]
Plan-R1 is a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task.<n>In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data.<n>In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization.
arXiv Detail & Related papers (2025-05-23T09:22:19Z)
LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios [23.913788819453796]
LiloDriver is a lifelong learning framework for closed-loop motion planning in long-tail autonomous driving scenarios.<n>It features a four-stage architecture including perception, scene encoding, memory-based strategy refinement, and LLM-guided reasoning.<n>Our results highlight the effectiveness of combining structured memory and LLM reasoning to enable scalable, human-like motion planning in real-world autonomous driving.
arXiv Detail & Related papers (2025-05-22T18:33:08Z)
Dynamic Path Navigation for Motion Agents with LLM Reasoning [69.5875073447454]
Large Language Models (LLMs) have demonstrated strong generalizable reasoning and planning capabilities. We explore the zero-shot navigation and path generation capabilities of LLMs by constructing a dataset and proposing an evaluation protocol. We demonstrate that, when tasks are well-structured in this manner, modern LLMs exhibit substantial planning proficiency in avoiding obstacles while autonomously refining navigation with the generated motion to reach the target.
arXiv Detail & Related papers (2025-03-10T13:39:09Z)
Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving [20.33096710167997]
generative planning with 3D-vision language pre-training model named GPVL is proposed for end-to-end autonomous driving. Cross-modal language model is introduced to generate holistic driving decisions and fine-grained trajectories. It is believed that the effective, robust and efficient performance of GPVL is crucial for the practical application of future autonomous driving systems.
arXiv Detail & Related papers (2025-01-15T15:20:46Z)
Hybrid Imitation-Learning Motion Planner for Urban Driving [0.0]
We propose a novel hybrid motion planner that integrates both learning-based and optimization-based techniques. Our model effectively balances safety and human-likeness, mitigating the trade-off inherent in these objectives. We validate our approach through simulation experiments and further demonstrate its efficacy by deploying it in real-world self-driving vehicles.
arXiv Detail & Related papers (2024-09-04T16:54:31Z)
Potential Based Diffusion Motion Planning [73.593988351275]
We propose a new approach towards learning potential based motion planning. We train a neural network to capture and learn an easily optimizable potentials over motion planning trajectories. We demonstrate its inherent composability, enabling us to generalize to a multitude of different motion constraints.
arXiv Detail & Related papers (2024-07-08T17:48:39Z)
iMotion-LLM: Motion Prediction Instruction Tuning [33.63656257401926]
We introduce iMotion-LLM: a Multimodal Large Language Models with trajectory prediction, tailored to guide interactive multi-agent scenarios. iMotion-LLM capitalizes on textual instructions as key inputs for generating contextually relevant trajectories. These findings act as milestones in empowering autonomous navigation systems to interpret and predict the dynamics of multi-agent environments.
arXiv Detail & Related papers (2024-06-10T12:22:06Z)
LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner. Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z)
Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems. LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning. We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z)
Interpretable and Flexible Target-Conditioned Neural Planners For Autonomous Vehicles [22.396215670672852]
Prior work only learns to estimate a single planning trajectory, while there may be multiple acceptable plans in real-world scenarios. We propose an interpretable neural planner to regress a heatmap, which effectively represents multiple potential goals in the bird's-eye view of an autonomous vehicle. Our systematic evaluation on the Lyft Open dataset shows that our model achieves a safer and more flexible driving performance than prior works.
arXiv Detail & Related papers (2023-09-23T22:13:03Z)
Integration of Reinforcement Learning Based Behavior Planning With Sampling Based Motion Planning for Automated Driving [0.5801044612920815]
We propose a method to employ a trained deep reinforcement learning policy for dedicated high-level behavior planning. To the best of our knowledge, this work is the first to apply deep reinforcement learning in this manner.
arXiv Detail & Related papers (2023-04-17T13:49:55Z)
End-to-end Interpretable Neural Motion Planner [78.69295676456085]
We propose a neural motion planner (NMP) for learning to drive autonomously in complex urban scenarios. We design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations. We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America.
arXiv Detail & Related papers (2021-01-17T14:16:12Z)
The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules. In this paper we propose to incorporate structured priors as a loss function. We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.