Making Large Language Models Better Planners with Reasoning-Decision Alignment
- URL: http://arxiv.org/abs/2408.13890v1
- Date: Sun, 25 Aug 2024 16:43:47 GMT
- Title: Making Large Language Models Better Planners with Reasoning-Decision Alignment
- Authors: Zhijian Huang, Tao Tang, Shaoxiang Chen, Sihao Lin, Zequn Jie, Lin Ma, Guangrun Wang, Xiaodan Liang,
- Abstract summary: We motivate an end-to-end decision-making model based on multimodality-augmented LLM.
We propose a reasoning-decision alignment constraint between the paired CoTs and planning results.
We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
- Score: 70.5381163219608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-driven approaches for autonomous driving (AD) have been widely adopted in the past decade but are confronted with dataset bias and uninterpretability. Inspired by the knowledge-driven nature of human driving, recent approaches explore the potential of large language models (LLMs) to improve understanding and decision-making in traffic scenarios. They find that the pretrain-finetune paradigm of LLMs on downstream data with the Chain-of-Thought (CoT) reasoning process can enhance explainability and scene understanding. However, such a popular strategy proves to suffer from the notorious problems of misalignment between the crafted CoTs against the consequent decision-making, which remains untouched by previous LLM-based AD methods. To address this problem, we motivate an end-to-end decision-making model based on multimodality-augmented LLM, which simultaneously executes CoT reasoning and carries out planning results. Furthermore, we propose a reasoning-decision alignment constraint between the paired CoTs and planning results, imposing the correspondence between reasoning and decision-making. Moreover, we redesign the CoTs to enable the model to comprehend complex scenarios and enhance decision-making performance. We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver. Experimental evaluations on the nuScenes and DriveLM-nuScenes benchmarks demonstrate the effectiveness of our RDA-Driver in enhancing the performance of end-to-end AD systems. Specifically, our RDA-Driver achieves state-of-the-art planning performance on the nuScenes dataset with 0.80 L2 error and 0.32 collision rate, and also achieves leading results on challenging DriveLM-nuScenes benchmarks with 0.82 L2 error and 0.38 collision rate.
Related papers
- TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [61.33599727106222]
TeLL-Drive is a hybrid framework that integrates a Teacher LLM to guide an attention-based Student DRL policy.
A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness.
arXiv Detail & Related papers (2025-02-03T14:22:03Z) - Reinforcing Thinking through Reasoning-Enhanced Reward Models [6.636512424910708]
Large Language Models (LLMs) exhibit great potential in complex multi-step reasoning through inference-time thinking.
LLMs struggle with deciding when to stop thinking due to limited self-awareness about their knowledge boundaries.
This work addresses these challenges by distilling the LLM's own reasoning processes into synthetic behavioral data.
arXiv Detail & Related papers (2024-12-31T04:50:15Z) - Linear Discriminant Analysis in Credit Scoring: A Transparent Hybrid Model Approach [9.88281854509076]
We implement Linear Discriminant Analysis (LDA) as a feature reduction technique, which reduces the burden of the models complexity.
Our hybrid model, XG-DNN, outperformed other models with the highest accuracy of 99.45% and a 99% F1 score with LDA.
To interpret model decisions, we have applied 2 different explainable AI techniques named LIME (local) and Morris Sensitivity Analysis (global)
arXiv Detail & Related papers (2024-12-05T14:21:18Z) - The Role of Deductive and Inductive Reasoning in Large Language Models [37.430396755248104]
We propose the Deductive and InDuctive(DID) method to enhance Large Language Models (LLMs) reasoning.
DID implements a dual-metric complexity evaluation system that combines Littlestone dimension and information entropy.
Our results demonstrate significant improvements in reasoning quality and solution accuracy.
arXiv Detail & Related papers (2024-10-03T18:30:47Z) - Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework [79.088116316919]
Connected Autonomous Vehicles (CAVs) have begun to open road testing around the world, but their safety and efficiency performance in complex scenarios is still not satisfactory.
This paper proposes CoDrivingLLM, an interactive and learnable LLM-driven cooperative driving framework.
arXiv Detail & Related papers (2024-09-19T14:36:00Z) - Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.
We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.
We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z) - Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving [43.156632952193966]
Traditional end-to-end driving models suffer from long-tail events due to rare or unseen inputs within their training distributions.
We propose TOKEN, a novel Multi-Modal Large Language Model (MM-LLM) that tokenizes the world into object-level knowledge.
ToKEN effectively alleviates data scarcity and inefficient tokenization by leveraging a traditional end-to-end driving model.
arXiv Detail & Related papers (2024-07-01T04:34:50Z) - Modeling Boundedly Rational Agents with Latent Inference Budgets [56.24971011281947]
We introduce a latent inference budget model (L-IBM) that models agents' computational constraints explicitly.
L-IBMs make it possible to learn agent models using data from diverse populations of suboptimal actors.
We show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty.
arXiv Detail & Related papers (2023-12-07T03:55:51Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.