ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning
- URL: http://arxiv.org/abs/2412.13682v2
- Date: Fri, 20 Dec 2024 15:08:25 GMT
- Title: ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning
- Authors: Jie-Jing Shao, Xiao-Wen Yang, Bo-Wen Zhang, Baizhi Chen, Wen-Da Wei, Guohao Cai, Zhenhua Dong, Lan-Zhe Guo, Yu-feng Li,
- Abstract summary: We introduce ChinaTravel, a benchmark specifically designed for authentic Chinese travel planning scenarios.
We collect the travel requirements from questionnaires and propose a compositionally generalizable domain-specific language.
Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving a constraint satisfaction rate of 27.9%.
We identify key challenges in real-world travel planning deployments, including open language reasoning and unseen concept composition.
- Score: 50.7898120693695
- License:
- Abstract: Recent advances in LLMs, particularly in language reasoning and tool integration, have rapidly sparked the real-world development of Language Agents. Among these, travel planning represents a prominent domain, combining academic challenges with practical value due to its complexity and market demand. However, existing benchmarks fail to reflect the diverse, real-world requirements crucial for deployment. To address this gap, we introduce ChinaTravel, a benchmark specifically designed for authentic Chinese travel planning scenarios. We collect the travel requirements from questionnaires and propose a compositionally generalizable domain-specific language that enables a scalable evaluation process, covering feasibility, constraint satisfaction, and preference comparison. Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving a constraint satisfaction rate of 27.9%, significantly surpassing purely neural models at 2.6%. Moreover, we identify key challenges in real-world travel planning deployments, including open language reasoning and unseen concept composition. These findings highlight the significance of ChinaTravel as a pivotal milestone for advancing language agents in complex, real-world planning scenarios.
Related papers
- EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios [53.26658545922884]
We introduce EgoPlan-Bench2, a benchmark designed to assess the planning capabilities of MLLMs across a wide range of real-world scenarios.
We evaluate 21 competitive MLLMs and provide an in-depth analysis of their limitations, revealing that they face significant challenges in real-world planning.
Our approach enhances the performance of GPT-4V by 10.24 on EgoPlan-Bench2 without additional training.
arXiv Detail & Related papers (2024-12-05T18:57:23Z) - To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning [54.9340658451129]
To the Globe (TTG) is a real-time demo system that takes natural language requests from users and translates it to symbolic form.
The overall system takes 5 seconds to reply to the user request with guaranteed itineraries.
When evaluated by users, TTG achieves consistently high Net Promoter Scores (NPS) of 35-40% on generated itinerary.
arXiv Detail & Related papers (2024-10-21T19:30:05Z) - TravelAgent: An AI Assistant for Personalized Travel Planning [36.046107116324826]
We introduce TravelAgent, a travel planning system powered by large language models (LLMs)
TravelAgent comprises four modules: Tool-usage, Recommendation, Planning, and Memory Module.
We evaluate TravelAgent's performance with human and simulated users, demonstrating its overall effectiveness in three criteria and confirming the accuracy of personalized recommendations.
arXiv Detail & Related papers (2024-09-12T14:24:45Z) - Ask-before-Plan: Proactive Language Agents for Real-World Planning [68.08024918064503]
Proactive Agent Planning requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction.
We propose a novel multi-agent framework, Clarification-Execution-Planning (textttCEP), which consists of three agents specialized in clarification, execution, and planning.
arXiv Detail & Related papers (2024-06-18T14:07:28Z) - TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners [6.378824981027464]
Traditional approaches rely on problem formulation in a given formal language.
Recent Large Language Model (LLM) based approaches directly output plans from user requests using language.
We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners.
arXiv Detail & Related papers (2024-06-14T17:31:16Z) - Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools [12.875270710153021]
Large Language Models (LLMs) struggle to directly generate correct plans for complex multi-constraint planning problems.
We propose an LLM-based planning framework that formalizes and solves complex multi-constraint planning problems.
Our framework achieves a success rate of 93.9% and is effective with diverse paraphrased prompts.
arXiv Detail & Related papers (2024-04-18T04:36:37Z) - TravelPlanner: A Benchmark for Real-World Planning with Language Agents [63.199454024966506]
We propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario.
It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans.
Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%.
arXiv Detail & Related papers (2024-02-02T18:39:51Z) - DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation [107.5934592892763]
We propose DREAMWALKER -- a world model based VLN-CE agent.
The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment.
It can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions.
arXiv Detail & Related papers (2023-08-14T23:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.