Related papers: TripTide: A Benchmark for Adaptive Travel Planning under Disruptions

TripTide: A Benchmark for Adaptive Travel Planning under Disruptions

URL: http://arxiv.org/abs/2510.21329v1
Date: Fri, 24 Oct 2025 10:39:55 GMT
Title: TripTide: A Benchmark for Adaptive Travel Planning under Disruptions
Authors: Priyanshu Karmakar, Soumyabrata Chaudhuri, Shubhojit Mallick, Manish Gupta, Abhik Jana, Shreya Ghosh,
Abstract summary: TripTide is the first benchmark evaluating Large Language Models' ability to revise under realistic disruptions.<n>Our experiments show that LLMs maintain strong sequential consistency and semantic stability, while spatial deviations are larger for shorter trips but decrease with longer ones.<n>TripTide establishes a benchmark for evaluating adaptability, personalization, and resilience in LLM-based travel planning under real-world uncertainty.
Score: 8.592189274445149
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent efforts like TripCraft and TravelPlanner have advanced the use of Large Language Models ( LLMs) for personalized, constraint aware travel itinerary generation. Yet, real travel often faces disruptions. To address this, we present TripTide, the first benchmark evaluating LLM's ability to revise itineraries under realistic disruptions. TripTide models key dimensions such as disruption severity and traveler tolerance, enabling nuanced assessment of LLM adaptability to events like flight cancellations, weather closures, or overbooked attractions. We conduct a threefold evaluation. First, we introduce automatic metrics including Preservation of Intent (how well the revised plan maintains feasibility and goals), Responsiveness (promptness and appropriateness of disruption handling), and Adaptability (semantic, spatial, and sequential divergence between original and revised plans). Second, we apply an LLM-as-a-judge approach to automatically assess revision quality. Third, we perform manual expert evaluation to verify whether revisions preserve semantic, spatial, sequential, and responsive aspects. Our experiments show that LLMs maintain strong sequential consistency and semantic stability, while spatial deviations are larger for shorter trips but decrease with longer ones, indicating that extended plans encourage better geographic coherence. However, disruption-handling ability declines as plan length increases, highlighting limits in LLM robustness. TripTide establishes a benchmark for evaluating adaptability, personalization, and resilience in LLM-based travel planning under real-world uncertainty.

Related papers

iTIMO: An LLM-empowered Synthesis Dataset for Travel Itinerary Modification [20.2135943012742]
iTIMO is a pipeline that frames the generation of need-to-modify itinerary data as an intent-driven perturbation task.<n>It instructs large language models to perturb real-world itineraries using three operations: REPLACE, ADD, and DELETE.<n>Overall, iTIMO provides a comprehensive testbed for the modification task, and empowers the evolution of traditional travel recommender systems.
arXiv Detail & Related papers (2026-01-15T17:24:51Z)
TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation [4.831964966659024]
We introduce a comprehensive benchmark for travel planning that unifies fine-grained criteria into a single reward.<n>Our evaluator achieves moderate agreement with travel-expert annotations (60.75%)<n>We release a large-scale dataset of 4,870 queries including 219 real-world, free-form requests for generalization to authentic user intent.
arXiv Detail & Related papers (2025-10-10T05:22:29Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning [53.065247112514534]
ATLAS is a general multi-agent framework designed to handle complex nature of constraints awareness in real-world travel planning tasks.<n>We demonstrate state-of-the-art performance on the TravelPlanner benchmark, improving the final pass rate from 23.3% to 44.4% over its best alternative.
arXiv Detail & Related papers (2025-09-29T23:23:52Z)
Iti-Validator: A Guardrail Framework for Validating and Correcting LLM-Generated Itineraries [0.0]
This research aims to study the temporal performance of different Large Language Models (LLMs)<n>It presents a validation framework that evaluates and improves the temporal consistency of LLM-generated travel itineraries.
arXiv Detail & Related papers (2025-09-04T06:11:57Z)
TripTailor: A Real-World Benchmark for Personalized Travel Planning [28.965273870656446]
TripTailor is a benchmark for personalized travel planning in real-world scenarios.<n>This dataset features over 500,000 real-world points of interest (POIs) and nearly 4,000 diverse travel itineraries.<n>We identify several critical challenges in travel planning, including the feasibility, rationality, and personalized customization.
arXiv Detail & Related papers (2025-08-02T16:44:02Z)
Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints [39.01715254437105]
This paper introduces the Multiple Aspects of Planning (MAoP) to solve planning problems with multifaceted constraints.<n>Instead of direct planning, MAoP leverages the strategist to conduct pre-planning from various aspects and provide the planning blueprint for planners.
arXiv Detail & Related papers (2025-06-14T09:37:59Z)
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling [74.41886258801209]
We propose a two-stage trajectory planning framework that decouples principle alignment from behavior learning.<n>Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-05-23T09:22:19Z)
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning [39.934634038758404]
This paper introduces TP-RAG, the first benchmark tailored retrieval-augmentedtemporalRAG-aware travel planning.<n>Our dataset includes 2,348 real-world travel queries, 85,575 fine-grain POIs, 18,784 annotated POIs.
arXiv Detail & Related papers (2025-04-11T17:02:40Z)
Centaur: Robust End-to-End Autonomous Driving with Test-Time Training [84.78837437133234]
We propose Centaur, which updates a planner's behavior via test-time training without relying on hand-engineered rules or cost functions.<n>We develop a novel uncertainty measure, called Cluster Entropy, which is simple, interpretable, and compatible with state-of-the-art planning algorithms.
arXiv Detail & Related papers (2025-03-14T17:59:41Z)
TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning [7.841787597078323]
TripCraft establishes a new benchmark for LLM driven personalized travel planning, offering a more realistic, constraint aware framework for itinerary generation.<n>Our parameter informed setting significantly enhances meal scheduling, improving the Temporal Meal Score from 61% to 80% in a 7 day scenario.
arXiv Detail & Related papers (2025-02-27T20:33:28Z)
Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference [53.419249906014194]
We study generative modeling for planning with datasets repurposed from offline reinforcement learning.<n>We introduce the Latent Plan Transformer (), a novel model that leverages a latent variable to connect a Transformer-based trajectory generator and the final return.
arXiv Detail & Related papers (2024-02-07T08:18:09Z)
LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner. Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z)
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency [53.8779374188643]
We propose a principled framework with provable regret guarantees to orchestrate reasoning and acting. Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon. At each step, the LLM agent takes the initial action of the planned trajectory ("act for now"), stores the collected feedback in the memory buffer, and reinvokes the reasoning routine to replan the future trajectory from the new state.
arXiv Detail & Related papers (2023-09-29T16:36:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.