RETAIL: Towards Real-world Travel Planning for Large Language Models
- URL: http://arxiv.org/abs/2508.15335v1
- Date: Thu, 21 Aug 2025 08:08:38 GMT
- Title: RETAIL: Towards Real-world Travel Planning for Large Language Models
- Authors: Bin Deng, Yizhe Feng, Zeming Liu, Qing Wei, Xiangrong Zhu, Shuai Chen, Yuanfang Guo, Yunhong Wang,
- Abstract summary: We present a novel dataset textbfRETAIL, which supports decision-making for implicit queries while covering explicit queries.<n>It also enables environmental awareness to ensure plan feasibility under real-world scenarios, while incorporating detailed POI information for all-in-one travel plans.<n>Our experiments reveal that even the strongest existing model achieves merely a 1.0% pass rate, indicating real-world travel planning remains extremely challenging.
- Score: 36.75531019697594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although large language models have enhanced automated travel planning abilities, current systems remain misaligned with real-world scenarios. First, they assume users provide explicit queries, while in reality requirements are often implicit. Second, existing solutions ignore diverse environmental factors and user preferences, limiting the feasibility of plans. Third, systems can only generate plans with basic POI arrangements, failing to provide all-in-one plans with rich details. To mitigate these challenges, we construct a novel dataset \textbf{RETAIL}, which supports decision-making for implicit queries while covering explicit queries, both with and without revision needs. It also enables environmental awareness to ensure plan feasibility under real-world scenarios, while incorporating detailed POI information for all-in-one travel plans. Furthermore, we propose a topic-guided multi-agent framework, termed TGMA. Our experiments reveal that even the strongest existing model achieves merely a 1.0% pass rate, indicating real-world travel planning remains extremely challenging. In contrast, TGMA demonstrates substantially improved performance 2.72%, offering promising directions for real-world travel planning.
Related papers
- ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning [53.065247112514534]
ATLAS is a general multi-agent framework designed to handle complex nature of constraints awareness in real-world travel planning tasks.<n>We demonstrate state-of-the-art performance on the TravelPlanner benchmark, improving the final pass rate from 23.3% to 44.4% over its best alternative.
arXiv Detail & Related papers (2025-09-29T23:23:52Z) - TripTailor: A Real-World Benchmark for Personalized Travel Planning [28.965273870656446]
TripTailor is a benchmark for personalized travel planning in real-world scenarios.<n>This dataset features over 500,000 real-world points of interest (POIs) and nearly 4,000 diverse travel itineraries.<n>We identify several critical challenges in travel planning, including the feasibility, rationality, and personalized customization.
arXiv Detail & Related papers (2025-08-02T16:44:02Z) - Plan Your Travel and Travel with Your Plan: Wide-Horizon Planning and Evaluation via LLM [58.50687282180444]
Travel planning is a complex task requiring the integration of diverse real-world information and user preferences.<n>We formulate this as an $L3$ planning problem, emphasizing long context, long instruction, and long output.<n>We introduce Multiple Aspects of Planning (MAoP), enabling LLMs to conduct wide-horizon thinking to solve complex planning problems.
arXiv Detail & Related papers (2025-06-14T09:37:59Z) - EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios [53.26658545922884]
We introduce EgoPlan-Bench2, a benchmark designed to assess the planning capabilities of MLLMs across a wide range of real-world scenarios.<n>We evaluate 21 competitive MLLMs and provide an in-depth analysis of their limitations, revealing that they face significant challenges in real-world planning.<n>Our approach enhances the performance of GPT-4V by 10.24 on EgoPlan-Bench2 without additional training.
arXiv Detail & Related papers (2024-12-05T18:57:23Z) - TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners [6.378824981027464]
Traditional approaches rely on problem formulation in a given formal language.
Recent Large Language Model (LLM) based approaches directly output plans from user requests using language.
We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners.
arXiv Detail & Related papers (2024-06-14T17:31:16Z) - PDDLEGO: Iterative Planning in Textual Environments [56.12148805913657]
Planning in textual environments has been shown to be a long-standing challenge even for current models.
We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal.
We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation.
arXiv Detail & Related papers (2024-05-30T08:01:20Z) - Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools [12.875270710153021]
Large Language Models (LLMs) struggle to directly generate correct plans for complex multi-constraint planning problems.<n>We propose an LLM-based planning framework that formalizes and solves complex multi-constraint planning problems.<n>Our framework achieves a success rate of 93.9% and is effective with diverse paraphrased prompts.
arXiv Detail & Related papers (2024-04-18T04:36:37Z) - TravelPlanner: A Benchmark for Real-World Planning with Language Agents [63.199454024966506]
We propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario.
It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans.
Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%.
arXiv Detail & Related papers (2024-02-02T18:39:51Z) - AdaPlanner: Adaptive Planning from Feedback with Language Models [56.367020818139665]
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks.
We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback.
To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities.
arXiv Detail & Related papers (2023-05-26T05:52:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.