TripTailor: A Real-World Benchmark for Personalized Travel Planning
- URL: http://arxiv.org/abs/2508.01432v1
- Date: Sat, 02 Aug 2025 16:44:02 GMT
- Title: TripTailor: A Real-World Benchmark for Personalized Travel Planning
- Authors: Yuanzhe Shen, Kaimin Wang, Changze Lv, Xiaoqing Zheng, Xuanjing Huang,
- Abstract summary: TripTailor is a benchmark for personalized travel planning in real-world scenarios.<n>This dataset features over 500,000 real-world points of interest (POIs) and nearly 4,000 diverse travel itineraries.<n>We identify several critical challenges in travel planning, including the feasibility, rationality, and personalized customization.
- Score: 28.965273870656446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The continuous evolution and enhanced reasoning capabilities of large language models (LLMs) have elevated their role in complex tasks, notably in travel planning, where demand for personalized, high-quality itineraries is rising. However, current benchmarks often rely on unrealistic simulated data, failing to reflect the differences between LLM-generated and real-world itineraries. Existing evaluation metrics, which primarily emphasize constraints, fall short of providing a comprehensive assessment of the overall quality of travel plans. To address these limitations, we introduce TripTailor, a benchmark designed specifically for personalized travel planning in real-world scenarios. This dataset features an extensive collection of over 500,000 real-world points of interest (POIs) and nearly 4,000 diverse travel itineraries, complete with detailed information, providing a more authentic evaluation framework. Experiments show that fewer than 10\% of the itineraries generated by the latest state-of-the-art LLMs achieve human-level performance. Moreover, we identify several critical challenges in travel planning, including the feasibility, rationality, and personalized customization of the proposed solutions. We hope that TripTailor will drive the development of travel planning agents capable of understanding and meeting user needs while generating practical itineraries. Our code and dataset are available at https://github.com/swxkfm/TripTailor
Related papers
- Plan Your Travel and Travel with Your Plan: Wide-Horizon Planning and Evaluation via LLM [58.50687282180444]
Travel planning is a complex task requiring the integration of diverse real-world information and user preferences.<n>We formulate this as an $L3$ planning problem, emphasizing long context, long instruction, and long output.<n>We introduce Multiple Aspects of Planning (MAoP), enabling LLMs to conduct wide-horizon thinking to solve complex planning problems.
arXiv Detail & Related papers (2025-06-14T09:37:59Z) - TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning [7.841787597078323]
TripCraft establishes a new benchmark for LLM driven personalized travel planning, offering a more realistic, constraint aware framework for itinerary generation.<n>Our parameter informed setting significantly enhances meal scheduling, improving the Temporal Meal Score from 61% to 80% in a 7 day scenario.
arXiv Detail & Related papers (2025-02-27T20:33:28Z) - ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning [49.37899519520761]
We introduce emphChinaTravel, the first open-ended benchmark grounded in authentic Chinese travel requirements.<n>We design a compositionally generalizable domain-specific language for scalable evaluation, covering feasibility, constraint satisfaction, and preference comparison.<n> Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving a 37.0% constraint satisfaction rate on human queries.
arXiv Detail & Related papers (2024-12-18T10:10:12Z) - EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios [53.26658545922884]
We introduce EgoPlan-Bench2, a benchmark designed to assess the planning capabilities of MLLMs across a wide range of real-world scenarios.<n>We evaluate 21 competitive MLLMs and provide an in-depth analysis of their limitations, revealing that they face significant challenges in real-world planning.<n>Our approach enhances the performance of GPT-4V by 10.24 on EgoPlan-Bench2 without additional training.
arXiv Detail & Related papers (2024-12-05T18:57:23Z) - To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning [54.9340658451129]
To the Globe (TTG) is a real-time demo system that takes natural language requests from users and translates it to symbolic form.
The overall system takes 5 seconds to reply to the user request with guaranteed itineraries.
When evaluated by users, TTG achieves consistently high Net Promoter Scores (NPS) of 35-40% on generated itinerary.
arXiv Detail & Related papers (2024-10-21T19:30:05Z) - TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners [6.378824981027464]
Traditional approaches rely on problem formulation in a given formal language.
Recent Large Language Model (LLM) based approaches directly output plans from user requests using language.
We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners.
arXiv Detail & Related papers (2024-06-14T17:31:16Z) - TravelPlanner: A Benchmark for Real-World Planning with Language Agents [63.199454024966506]
We propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario.
It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans.
Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%.
arXiv Detail & Related papers (2024-02-02T18:39:51Z) - End-to-end Interpretable Neural Motion Planner [78.69295676456085]
We propose a neural motion planner (NMP) for learning to drive autonomously in complex urban scenarios.
We design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations.
We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America.
arXiv Detail & Related papers (2021-01-17T14:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.