TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning
- URL: http://arxiv.org/abs/2502.20508v1
- Date: Thu, 27 Feb 2025 20:33:28 GMT
- Title: TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning
- Authors: Soumyabrata Chaudhuri, Pranav Purkar, Ritwik Raghav, Shubhojit Mallick, Manish Gupta, Abhik Jana, Shreya Ghosh,
- Abstract summary: TripCraft establishes a new benchmark for LLM driven personalized travel planning, offering a more realistic, constraint aware framework for itinerary generation.<n>Our parameter informed setting significantly enhances meal scheduling, improving the Temporal Meal Score from 61% to 80% in a 7 day scenario.
- Score: 7.841787597078323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in probing Large Language Models (LLMs) have explored their latent potential as personalized travel planning agents, yet existing benchmarks remain limited in real world applicability. Existing datasets, such as TravelPlanner and TravelPlanner+, suffer from semi synthetic data reliance, spatial inconsistencies, and a lack of key travel constraints, making them inadequate for practical itinerary generation. To address these gaps, we introduce TripCraft, a spatiotemporally coherent travel planning dataset that integrates real world constraints, including public transit schedules, event availability, diverse attraction categories, and user personas for enhanced personalization. To evaluate LLM generated plans beyond existing binary validation methods, we propose five continuous evaluation metrics, namely Temporal Meal Score, Temporal Attraction Score, Spatial Score, Ordering Score, and Persona Score which assess itinerary quality across multiple dimensions. Our parameter informed setting significantly enhances meal scheduling, improving the Temporal Meal Score from 61% to 80% in a 7 day scenario. TripCraft establishes a new benchmark for LLM driven personalized travel planning, offering a more realistic, constraint aware framework for itinerary generation. Dataset and Codebase will be made publicly available upon acceptance.
Related papers
- TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning [39.934634038758404]
This paper introduces TP-RAG, the first benchmark tailored retrieval-augmentedtemporalRAG-aware travel planning.
Our dataset includes 2,348 real-world travel queries, 85,575 fine-grain POIs, 18,784 annotated POIs.
arXiv Detail & Related papers (2025-04-11T17:02:40Z) - ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning [50.7898120693695]
We introduce ChinaTravel, a benchmark specifically designed for authentic Chinese travel planning scenarios.<n>We collect the travel requirements from questionnaires and propose a compositionally generalizable domain-specific language.<n> Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving a constraint satisfaction rate of 27.9%.<n>We identify key challenges in real-world travel planning deployments, including open language reasoning and unseen concept composition.
arXiv Detail & Related papers (2024-12-18T10:10:12Z) - To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning [54.9340658451129]
To the Globe (TTG) is a real-time demo system that takes natural language requests from users and translates it to symbolic form.
The overall system takes 5 seconds to reply to the user request with guaranteed itineraries.
When evaluated by users, TTG achieves consistently high Net Promoter Scores (NPS) of 35-40% on generated itinerary.
arXiv Detail & Related papers (2024-10-21T19:30:05Z) - Reconsidering utility: unveiling the limitations of synthetic mobility data generation algorithms in real-life scenarios [49.1574468325115]
We evaluate the utility of five state-of-the-art synthesis approaches in terms of real-world applicability.
We focus on so-called trip data that encode fine granular urban movements such as GPS-tracked taxi rides.
One model fails to produce data within reasonable time and another generates too many jumps to meet the requirements for map matching.
arXiv Detail & Related papers (2024-07-03T16:08:05Z) - TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners [6.378824981027464]
Traditional approaches rely on problem formulation in a given formal language.
Recent Large Language Model (LLM) based approaches directly output plans from user requests using language.
We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners.
arXiv Detail & Related papers (2024-06-14T17:31:16Z) - NATURAL PLAN: Benchmarking LLMs on Natural Language Planning [109.73382347588417]
We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling.
We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models.
arXiv Detail & Related papers (2024-06-06T21:27:35Z) - Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools [12.875270710153021]
Large Language Models (LLMs) struggle to directly generate correct plans for complex multi-constraint planning problems.<n>We propose an LLM-based planning framework that formalizes and solves complex multi-constraint planning problems.<n>Our framework achieves a success rate of 93.9% and is effective with diverse paraphrased prompts.
arXiv Detail & Related papers (2024-04-18T04:36:37Z) - TravelPlanner: A Benchmark for Real-World Planning with Language Agents [63.199454024966506]
We propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario.
It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans.
Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%.
arXiv Detail & Related papers (2024-02-02T18:39:51Z) - Differentiable Spatial Planning using Transformers [87.90709874369192]
We propose Spatial Planning Transformers (SPT), which given an obstacle map learns to generate actions by planning over long-range spatial dependencies.
In the setting where the ground truth map is not known to the agent, we leverage pre-trained SPTs in an end-to-end framework.
SPTs outperform prior state-of-the-art differentiable planners across all the setups for both manipulation and navigation tasks.
arXiv Detail & Related papers (2021-12-02T06:48:16Z) - nuPlan: A closed-loop ML-based planning benchmark for autonomous
vehicles [7.212066200339641]
We propose the world's first closed-loop ML-based planning benchmark for autonomous driving.
We provide a high-quality dataset with 1500h of human driving data from 4 cities across the US and Asia.
We plan to release the dataset at NeurIPS 2021 and organize benchmark challenges starting in early 2022.
arXiv Detail & Related papers (2021-06-22T14:24:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.