TravelPlanner: A Benchmark for Real-World Planning with Language Agents
- URL: http://arxiv.org/abs/2402.01622v4
- Date: Wed, 23 Oct 2024 15:02:57 GMT
- Title: TravelPlanner: A Benchmark for Real-World Planning with Language Agents
- Authors: Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, Yu Su,
- Abstract summary: We propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario.
It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans.
Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%.
- Score: 63.199454024966506
- License:
- Abstract: Planning has been part of the core pursuit for artificial intelligence since its conception, but earlier AI agents mostly focused on constrained settings because many of the cognitive substrates necessary for human-level planning have been lacking. Recently, language agents powered by large language models (LLMs) have shown interesting capabilities such as tool use and reasoning. Are these language agents capable of planning in more complex settings that are out of the reach of prior AI agents? To advance this investigation, we propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario. It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans. Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%. Language agents struggle to stay on task, use the right tools to collect information, or keep track of multiple constraints. However, we note that the mere possibility for language agents to tackle such a complex problem is in itself non-trivial progress. TravelPlanner provides a challenging yet meaningful testbed for future language agents.
Related papers
- ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning [50.7898120693695]
We introduce ChinaTravel, a benchmark specifically designed for authentic Chinese travel planning scenarios.
We collect the travel requirements from questionnaires and propose a compositionally generalizable domain-specific language.
Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving a constraint satisfaction rate of 27.9%.
We identify key challenges in real-world travel planning deployments, including open language reasoning and unseen concept composition.
arXiv Detail & Related papers (2024-12-18T10:10:12Z) - EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios [53.26658545922884]
We introduce EgoPlan-Bench2, a benchmark designed to assess the planning capabilities of MLLMs across a wide range of real-world scenarios.
We evaluate 21 competitive MLLMs and provide an in-depth analysis of their limitations, revealing that they face significant challenges in real-world planning.
Our approach enhances the performance of GPT-4V by 10.24 on EgoPlan-Bench2 without additional training.
arXiv Detail & Related papers (2024-12-05T18:57:23Z) - One STEP at a time: Language Agents are Stepwise Planners [9.877911778606014]
We introduce STEP, a framework designed to learn from previous experiences to enhance the planning capabilities of language agents.
Step consistently outperforms state-of-the-art models in the ScienceWorld benchmark.
These findings highlight STEP's potential as a framework for enhancing planning capabilities in language agents.
arXiv Detail & Related papers (2024-11-13T08:32:42Z) - Revealing the Barriers of Language Agents in Planning [44.913745512049246]
We show that current language agents still lack human-level planning abilities.
Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks.
We identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions.
arXiv Detail & Related papers (2024-10-16T09:44:38Z) - ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs [0.32141666878560626]
We introduce ReasonPlanner, a novel generalist agent designed for reflective thinking, planning, and interactive reasoning.
ReasonPlanner significantly outperforms previous state-of-the-art prompting-based methods on the ScienceWorld benchmark by more than 1.8 times.
It relies solely on frozen weights thus requiring no gradient updates.
arXiv Detail & Related papers (2024-10-11T20:58:51Z) - Symbolic Learning Enables Self-Evolving Agents [55.625275970720374]
We introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own.
Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning.
We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks.
arXiv Detail & Related papers (2024-06-26T17:59:18Z) - Ask-before-Plan: Proactive Language Agents for Real-World Planning [68.08024918064503]
Proactive Agent Planning requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction.
We propose a novel multi-agent framework, Clarification-Execution-Planning (textttCEP), which consists of three agents specialized in clarification, execution, and planning.
arXiv Detail & Related papers (2024-06-18T14:07:28Z) - KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents [52.348929737851165]
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges.
This inadequacy primarily stems from the lack of built-in action knowledge in language agents.
We introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge.
arXiv Detail & Related papers (2024-03-05T16:39:12Z) - Comprehensive Multi-Agent Epistemic Planning [0.0]
This manuscript is focused on a specialized kind of planning known as Multi-agent Epistemic Planning (MEP).
EP refers to an automated planning setting where the agent reasons in the space of knowledge/beliefs states and tries to find a plan to reach a desirable state from a starting one.
Its general form, the MEP problem, involves multiple agents who need to reason about both the state of the world and the information flows between agents.
arXiv Detail & Related papers (2021-09-17T01:50:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.