TravelPlanner: A Benchmark for Real-World Planning with Language Agents
- URL: http://arxiv.org/abs/2402.01622v4
- Date: Wed, 23 Oct 2024 15:02:57 GMT
- Title: TravelPlanner: A Benchmark for Real-World Planning with Language Agents
- Authors: Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, Yu Su,
- Abstract summary: We propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario.
It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans.
Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%.
- Score: 63.199454024966506
- License:
- Abstract: Planning has been part of the core pursuit for artificial intelligence since its conception, but earlier AI agents mostly focused on constrained settings because many of the cognitive substrates necessary for human-level planning have been lacking. Recently, language agents powered by large language models (LLMs) have shown interesting capabilities such as tool use and reasoning. Are these language agents capable of planning in more complex settings that are out of the reach of prior AI agents? To advance this investigation, we propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario. It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans. Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%. Language agents struggle to stay on task, use the right tools to collect information, or keep track of multiple constraints. However, we note that the mere possibility for language agents to tackle such a complex problem is in itself non-trivial progress. TravelPlanner provides a challenging yet meaningful testbed for future language agents.
Related papers
- Revealing the Barriers of Language Agents in Planning [44.913745512049246]
We show that current language agents still lack human-level planning abilities.
Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks.
We identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions.
arXiv Detail & Related papers (2024-10-16T09:44:38Z) - ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs [0.32141666878560626]
We introduce ReasonPlanner, a novel generalist agent designed for reflective thinking, planning, and interactive reasoning.
ReasonPlanner significantly outperforms previous state-of-the-art prompting-based methods on the ScienceWorld benchmark by more than 1.8 times.
It relies solely on frozen weights thus requiring no gradient updates.
arXiv Detail & Related papers (2024-10-11T20:58:51Z) - Smart Language Agents in Real-World Planning [0.0]
We seek to improve the travel-planning capability of Large Language Models (LLMs)
We propose a semi-automated prompt generation framework which combines the LLM-automated prompt and "human-in-the-loop"
Our result shows that LLM automated prompt has its limitations and "human-in-the-loop" greatly improves the performance by $139%$ with one single iteration.
arXiv Detail & Related papers (2024-07-29T03:00:30Z) - Symbolic Learning Enables Self-Evolving Agents [55.625275970720374]
We introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own.
Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning.
We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks.
arXiv Detail & Related papers (2024-06-26T17:59:18Z) - Ask-before-Plan: Proactive Language Agents for Real-World Planning [68.08024918064503]
Proactive Agent Planning requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction.
We propose a novel multi-agent framework, Clarification-Execution-Planning (textttCEP), which consists of three agents specialized in clarification, execution, and planning.
arXiv Detail & Related papers (2024-06-18T14:07:28Z) - TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners [6.378824981027464]
Traditional approaches rely on problem formulation in a given formal language.
Recent Large Language Model (LLM) based approaches directly output plans from user requests using language.
We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners.
arXiv Detail & Related papers (2024-06-14T17:31:16Z) - KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents [54.09074527006576]
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges.
This inadequacy primarily stems from the lack of built-in action knowledge in language agents.
We introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge.
arXiv Detail & Related papers (2024-03-05T16:39:12Z) - Learning adaptive planning representations with natural language
guidance [90.24449752926866]
This paper describes Ada, a framework for automatically constructing task-specific planning representations.
Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks.
arXiv Detail & Related papers (2023-12-13T23:35:31Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - Comprehensive Multi-Agent Epistemic Planning [0.0]
This manuscript is focused on a specialized kind of planning known as Multi-agent Epistemic Planning (MEP).
EP refers to an automated planning setting where the agent reasons in the space of knowledge/beliefs states and tries to find a plan to reach a desirable state from a starting one.
Its general form, the MEP problem, involves multiple agents who need to reason about both the state of the world and the information flows between agents.
arXiv Detail & Related papers (2021-09-17T01:50:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.