Improving Large Language Model Planning with Action Sequence Similarity
- URL: http://arxiv.org/abs/2505.01009v1
- Date: Fri, 02 May 2025 05:16:17 GMT
- Title: Improving Large Language Model Planning with Action Sequence Similarity
- Authors: Xinran Zhao, Hanie Sedghi, Bernd Bohnet, Dale Schuurmans, Azade Nova,
- Abstract summary: In this work, we explore how to improve the model planning capability through in-context learning (ICL)<n>We propose GRASE-DC: a two-stage pipeline that first re-samples high AS exemplars and then curates the selected exemplars.<n>Our experimental result confirms that GRASE-DC achieves significant performance improvement on various planning tasks.
- Score: 50.52049888490524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Planning is essential for artificial intelligence systems to look ahead and proactively determine a course of actions to reach objectives in the virtual and real world. Recent work on large language models (LLMs) sheds light on their planning capability in various tasks. However, it remains unclear what signals in the context influence the model performance. In this work, we explore how to improve the model planning capability through in-context learning (ICL), specifically, what signals can help select the exemplars. Through extensive experiments, we observe that commonly used problem similarity may result in false positives with drastically different plans, which can mislead the model. In response, we propose to sample and filter exemplars leveraging plan side action sequence similarity (AS). We propose GRASE-DC: a two-stage pipeline that first re-samples high AS exemplars and then curates the selected exemplars with dynamic clustering on AS to achieve a balance of relevance and diversity. Our experimental result confirms that GRASE-DC achieves significant performance improvement on various planning tasks (up to ~11-40 point absolute accuracy improvement with 27.3% fewer exemplars needed on average). With GRASE-DC* + VAL, where we iteratively apply GRASE-DC with a validator, we are able to even boost the performance by 18.9% more. Extensive analysis validates the consistent performance improvement of GRASE-DC with various backbone LLMs and on both classical planning and natural language planning benchmarks. GRASE-DC can further boost the planning accuracy by ~24 absolute points on harder problems using simpler problems as exemplars over a random baseline. This demonstrates its ability to generalize to out-of-distribution problems.
Related papers
- PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving [66.42260489147617]
We introduce PLAN-TUNING, a framework that distills synthetic task decompositions from large-scale language models.<n>Plan-TUNING fine-tunes smaller models via supervised and reinforcement-learning objectives to improve complex reasoning.<n>Our analysis demonstrates how planning trajectories improves complex reasoning capabilities.
arXiv Detail & Related papers (2025-07-10T07:30:44Z) - Learning Causal Graphs at Scale: A Foundation Model Approach [28.966180222166766]
We propose Attention-DAG (ADAG), a novel attention-mechanism-based architecture for learning multiple linear Structural Equation Models (SEMs)<n>ADAG learns the mapping from observed data to both graph structure and parameters via a nonlinear attention-based kernel.<n>We evaluate our proposed approach on benchmark synthetic datasets and find that ADAG achieves substantial improvements in both DAG learning accuracy and zero-shot inference efficiency.
arXiv Detail & Related papers (2025-06-23T04:41:02Z) - Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance.<n>We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [31.509112804985133]
Reinforcement learning (RL) learns policies through trial and error, and optimal control, which plans actions using a learned or known dynamics model.<n>We systematically analyze the performance of different RL and control-based methods under datasets of varying quality.<n>Our results show that model-free RL excels when abundant, high-quality data is available, while model-based planning excels in generalization to novel environment layouts, trajectory stitching, and data-efficiency.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - GRAPE: Generalizing Robot Policy via Preference Alignment [58.419992317452376]
We present GRAPE: Generalizing Robot Policy via Preference Alignment.<n>We show GRAPE increases success rates on in-domain and unseen manipulation tasks by 51.79% and 58.20%, respectively.<n> GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 37.44% and rollout step-length by 11.15%, respectively.
arXiv Detail & Related papers (2024-11-28T18:30:10Z) - Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model [14.480267340831542]
Structure-aware Planning with an Accurate World Model (SWAP)<n>SWAP integrates structured knowledge representation with learned planning.<n>We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks.
arXiv Detail & Related papers (2024-10-04T04:23:36Z) - Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning [10.704716790096498]
Large language models (LLMs) have demonstrated impressive task-solving capabilities through prompting techniques and system designs.<n>For planning tasks with limited prior data, the performance of LLMs, including proprietary models like GPT and Gemini, is poor.<n>This paper investigates the impact of fine-tuning on the planning capabilities of LLMs.
arXiv Detail & Related papers (2024-06-15T03:06:14Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.