CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
- URL: http://arxiv.org/abs/2503.07234v1
- Date: Mon, 10 Mar 2025 12:17:38 GMT
- Title: CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
- Authors: Haicheng Liao, Hanlin Kong, Bonan Wang, Chengyue Wang, Wang Ye, Zhengbing He, Chengzhong Xu, Zhenning Li,
- Abstract summary: CoT-Drive is a novel approach that enhances motion forecasting by leveraging large language models (LLMs) and a chain-of-thought (CoT) prompting method.<n>We introduce a teacher-student knowledge distillation strategy to effectively transfer LLMs' advanced scene understanding capabilities to lightweight language models (LMs)<n>We present two new scene description datasets, Highway-Text and Urban-Text, designed for fine-tuning lightweight LMs to generate context-specific semantic annotations.
- Score: 14.567180355849501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate motion forecasting is crucial for safe autonomous driving (AD). This study proposes CoT-Drive, a novel approach that enhances motion forecasting by leveraging large language models (LLMs) and a chain-of-thought (CoT) prompting method. We introduce a teacher-student knowledge distillation strategy to effectively transfer LLMs' advanced scene understanding capabilities to lightweight language models (LMs), ensuring that CoT-Drive operates in real-time on edge devices while maintaining comprehensive scene understanding and generalization capabilities. By leveraging CoT prompting techniques for LLMs without additional training, CoT-Drive generates semantic annotations that significantly improve the understanding of complex traffic environments, thereby boosting the accuracy and robustness of predictions. Additionally, we present two new scene description datasets, Highway-Text and Urban-Text, designed for fine-tuning lightweight LMs to generate context-specific semantic annotations. Comprehensive evaluations of five real-world datasets demonstrate that CoT-Drive outperforms existing models, highlighting its effectiveness and efficiency in handling complex traffic scenarios. Overall, this study is the first to consider the practical application of LLMs in this field. It pioneers the training and use of a lightweight LLM surrogate for motion forecasting, setting a new benchmark and showcasing the potential of integrating LLMs into AD systems.
Related papers
- Dynamic Path Navigation for Motion Agents with LLM Reasoning [69.5875073447454]
Large Language Models (LLMs) have demonstrated strong generalizable reasoning and planning capabilities.
We explore the zero-shot navigation and path generation capabilities of LLMs by constructing a dataset and proposing an evaluation protocol.
We demonstrate that, when tasks are well-structured in this manner, modern LLMs exhibit substantial planning proficiency in avoiding obstacles while autonomously refining navigation with the generated motion to reach the target.
arXiv Detail & Related papers (2025-03-10T13:39:09Z) - TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [61.33599727106222]
TeLL-Drive is a hybrid framework that integrates a Teacher LLM to guide an attention-based Student DRL policy.<n>A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness.
arXiv Detail & Related papers (2025-02-03T14:22:03Z) - SenseRAG: Constructing Environmental Knowledge Bases with Proactive Querying for LLM-Based Autonomous Driving [10.041702058108482]
This study addresses the critical need for enhanced situational awareness in autonomous driving (AD) by leveraging the contextual reasoning capabilities of large language models (LLMs)<n>Unlike traditional perception systems that rely on rigid, label-based annotations, it integrates real-time, multimodal sensor data into a unified, LLMs-readable knowledge base.<n> Experimental results using real-world Vehicle-to-everything (V2X) datasets demonstrate significant improvements in perception and prediction performance.
arXiv Detail & Related papers (2025-01-07T05:15:46Z) - OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework [3.8320050452121692]
We introduce OWLed, the Outlier-Weighed Layerwise Pruning for Efficient Autonomous Driving Framework.
Our method assigns non-uniform sparsity ratios to different layers based on the distribution of outlier features.
To ensure the compressed model adapts well to autonomous driving tasks, we incorporate driving environment data into both the calibration and pruning processes.
arXiv Detail & Related papers (2024-11-12T10:55:30Z) - Strada-LLM: Graph LLM for traffic prediction [62.2015839597764]
A considerable challenge in traffic prediction lies in handling the diverse data distributions caused by vastly different traffic conditions.<n>We propose a graph-aware LLM for traffic prediction that considers proximal traffic information.<n>We adopt a lightweight approach for efficient domain adaptation when facing new data distributions in few-shot fashion.
arXiv Detail & Related papers (2024-10-28T09:19:29Z) - CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models [12.687494201105066]
This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) to generate future motion from agents' past/observed trajectories and scene semantics.
LLMs' powerful comprehension abilities capture a spectrum of high-level scene knowledge and interactive information.
Emulating the human-like lane focus cognitive function, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module.
arXiv Detail & Related papers (2024-05-08T09:28:04Z) - Improve Temporal Awareness of LLMs for Sequential Recommendation [61.723928508200196]
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks.
LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data.
We propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation.
arXiv Detail & Related papers (2024-05-05T00:21:26Z) - LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models [8.624969693477448]
Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability.
We propose LC-LLM, an explainable lane change prediction model that leverages the strong reasoning capabilities and self-explanation abilities of Large Language Models.
arXiv Detail & Related papers (2024-03-27T08:34:55Z) - Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving [13.879945446114956]
We utilize Large Language Models (LLMs) to enhance the global traffic context understanding for motion prediction tasks.
Considering the cost associated with LLMs, we propose a cost-effective deployment strategy.
Our research offers valuable insights into enhancing the understanding of traffic scenes of LLMs and the motion prediction performance of autonomous driving.
arXiv Detail & Related papers (2024-03-17T02:06:49Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.