Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints
- URL: http://arxiv.org/abs/2506.12421v2
- Date: Sun, 12 Oct 2025 13:53:29 GMT
- Title: Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints
- Authors: Dongjie Yang, Chengqiang Lu, Qimeng Wang, Xinbei Ma, Yan Gao, Yao Hu, Hai Zhao,
- Abstract summary: This paper introduces the Multiple Aspects of Planning (MAoP) to solve planning problems with multifaceted constraints.<n>Instead of direct planning, MAoP leverages the strategist to conduct pre-planning from various aspects and provide the planning blueprint for planners.
- Score: 39.01715254437105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unlike reasoning, which often entails a deep sequence of deductive steps, complex real-world planning is characterized by the need to synthesize a broad spectrum of parallel and potentially conflicting information and constraints. For example, in travel planning scenarios, it requires the integration of diverse real-world information and user preferences. While LLMs show promise, existing methods with long-horizon thinking struggle with handling multifaceted constraints, leading to suboptimal solutions. Motivated by the challenges of real-world travel planning, this paper introduces the Multiple Aspects of Planning (MAoP), empowering LLMs with "wide-horizon thinking" to solve planning problems with multifaceted constraints. Instead of direct planning, MAoP leverages the strategist to conduct pre-planning from various aspects and provide the planning blueprint for planners, enabling strong inference-time scalability by scaling aspects to consider various constraints. In addition, existing benchmarks for multi-constraint planning are flawed because they assess constraints in isolation, ignoring causal dependencies within the constraints, e.g, travel planning, where past activities dictate future itinerary. To address this, we propose Travel-Sim, an agent-based benchmark assessing plans via real-world simulation, thereby inherently resolving these causal dependencies. This paper advances LLM capabilities in complex planning and offers novel insights for evaluating sophisticated scenarios through simulation.
Related papers
- DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints [25.987776928014707]
We introduce DeepPlanning, a benchmark for practical long-horizon agent planning.<n>It features multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization.
arXiv Detail & Related papers (2026-01-26T04:43:49Z) - TripTailor: A Real-World Benchmark for Personalized Travel Planning [28.965273870656446]
TripTailor is a benchmark for personalized travel planning in real-world scenarios.<n>This dataset features over 500,000 real-world points of interest (POIs) and nearly 4,000 diverse travel itineraries.<n>We identify several critical challenges in travel planning, including the feasibility, rationality, and personalized customization.
arXiv Detail & Related papers (2025-08-02T16:44:02Z) - MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models [42.30936364450115]
Multimodal planning capabilities refer to the ability to predict, reason, and design steps for task execution with multimodal context.<n>Current benchmarks face two key challenges: (1) they cannot directly assess multimodal real-world planning capabilities, and (2) they lack constraints or implicit constraints across modalities.<n>We introduce Multimodal Planning with Complex Constraints (MPCC), the first benchmark to systematically evaluate MLLMs' ability to handle multimodal constraints in planning.
arXiv Detail & Related papers (2025-07-31T09:59:17Z) - Decompose, Plan in Parallel, and Merge: A Novel Paradigm for Large Language Models based Planning with Multiple Constraints [31.631832677979826]
We propose a novel parallel planning paradigm, which Decomposes, Plans for subtasks in Parallel, and Merges subplans into a final plan (DPPM)<n>Specifically, DPPM decomposes the complex task based on constraints into subtasks, generates the subplan for each subtask in parallel, and merges them into a global plan.<n> Experimental results demonstrate that DPPM significantly outperforms existing methods in travel planning tasks.
arXiv Detail & Related papers (2025-06-03T09:33:13Z) - HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking [109.09735490692202]
We propose HyperTree Planning (HTP), a novel reasoning paradigm that constructs hypertree-structured planning outlines for effective planning.<n> Experiments demonstrate the effectiveness of HTP, achieving state-of-the-art accuracy on the TravelPlanner benchmark with Gemini-1.5-Pro, resulting in a 3.6 times performance improvement over o1-preview.
arXiv Detail & Related papers (2025-05-05T02:38:58Z) - Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks [36.63527489464188]
Plan-and-Act is a framework that incorporates explicit planning into large language models (LLMs)<n>Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions.<n>We present a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.
arXiv Detail & Related papers (2025-03-12T17:40:52Z) - EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios [53.26658545922884]
We introduce EgoPlan-Bench2, a benchmark designed to assess the planning capabilities of MLLMs across a wide range of real-world scenarios.<n>We evaluate 21 competitive MLLMs and provide an in-depth analysis of their limitations, revealing that they face significant challenges in real-world planning.<n>Our approach enhances the performance of GPT-4V by 10.24 on EgoPlan-Bench2 without additional training.
arXiv Detail & Related papers (2024-12-05T18:57:23Z) - Smart Language Agents in Real-World Planning [0.0]
We seek to improve the travel-planning capability of Large Language Models (LLMs)
We propose a semi-automated prompt generation framework which combines the LLM-automated prompt and "human-in-the-loop"
Our result shows that LLM automated prompt has its limitations and "human-in-the-loop" greatly improves the performance by $139%$ with one single iteration.
arXiv Detail & Related papers (2024-07-29T03:00:30Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners [6.378824981027464]
Traditional approaches rely on problem formulation in a given formal language.
Recent Large Language Model (LLM) based approaches directly output plans from user requests using language.
We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners.
arXiv Detail & Related papers (2024-06-14T17:31:16Z) - What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models [7.216683826556268]
Large language models (LLMs) are increasingly used for applications that require planning capabilities.
We introduce SimPlan, a novel hybrid-method, and evaluate its performance in a new challenging setup.
arXiv Detail & Related papers (2024-02-18T07:42:49Z) - On the Prospects of Incorporating Large Language Models (LLMs) in
Automated Planning and Scheduling (APS) [23.024862968785147]
This paper investigates eight categories based on the unique applications of LLMs in addressing various aspects of planning problems.
A critical insight resulting from our review is that the true potential of LLMs unfolds when they are integrated with traditional symbolic planners.
arXiv Detail & Related papers (2024-01-04T19:22:09Z) - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner.
Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z) - EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning [84.6451394629312]
We introduce EgoPlan-Bench, a benchmark to evaluate the planning abilities of MLLMs in real-world scenarios.
We show that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning.
We also present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench.
arXiv Detail & Related papers (2023-12-11T03:35:58Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - AdaPlanner: Adaptive Planning from Feedback with Language Models [56.367020818139665]
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks.
We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback.
To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities.
arXiv Detail & Related papers (2023-05-26T05:52:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.