A Data Synthesis Method Driven by Large Language Models for Proactive Mining of Implicit User Intentions in Tourism
- URL: http://arxiv.org/abs/2505.11533v1
- Date: Wed, 14 May 2025 02:36:17 GMT
- Title: A Data Synthesis Method Driven by Large Language Models for Proactive Mining of Implicit User Intentions in Tourism
- Authors: Jinqiang Wang, Huansheng Ning, Tao Zhu, Jianguo Ding,
- Abstract summary: In the tourism domain, Large Language Models (LLMs) often struggle to mine implicit user intentions from tourists' ambiguous inquiries.<n>We propose SynPT, which constructs an LLM-driven user agent and assistant agent to simulate dialogues based on seed data collected from Chinese tourism websites.
- Score: 6.387945824899046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the tourism domain, Large Language Models (LLMs) often struggle to mine implicit user intentions from tourists' ambiguous inquiries and lack the capacity to proactively guide users toward clarifying their needs. A critical bottleneck is the scarcity of high-quality training datasets that facilitate proactive questioning and implicit intention mining. While recent advances leverage LLM-driven data synthesis to generate such datasets and transfer specialized knowledge to downstream models, existing approaches suffer from several shortcomings: (1) lack of adaptation to the tourism domain, (2) skewed distributions of detail levels in initial inquiries, (3) contextual redundancy in the implicit intention mining module, and (4) lack of explicit thinking about tourists' emotions and intention values. Therefore, we propose SynPT (A Data Synthesis Method Driven by LLMs for Proactive Mining of Implicit User Intentions in the Tourism), which constructs an LLM-driven user agent and assistant agent to simulate dialogues based on seed data collected from Chinese tourism websites. This approach addresses the aforementioned limitations and generates SynPT-Dialog, a training dataset containing explicit reasoning. The dataset is utilized to fine-tune a general LLM, enabling it to proactively mine implicit user intentions. Experimental evaluations, conducted from both human and LLM perspectives, demonstrate the superiority of SynPT compared to existing methods. Furthermore, we analyze key hyperparameters and present case studies to illustrate the practical applicability of our method, including discussions on its adaptability to English-language scenarios. All code and data are publicly available.
Related papers
- Teaching Language Models To Gather Information Proactively [53.85419549904644]
Large language models (LLMs) are increasingly expected to function as collaborative partners.<n>In this work, we introduce a new task paradigm: proactive information gathering.<n>We design a scalable framework that generates partially specified, real-world tasks, masking key information.<n>Within this setup, our core innovation is a reinforcement finetuning strategy that rewards questions that elicit genuinely new, implicit user information.
arXiv Detail & Related papers (2025-07-28T23:50:09Z) - Aligning LLM with human travel choices: a persona-based embedding learning approach [15.11130742093296]
This paper introduces a novel framework for aligning large language models with human travel choice behavior.<n>Our framework uses a persona inference and loading process to condition LLMs with suitable prompts to enhance alignment.
arXiv Detail & Related papers (2025-05-25T06:54:01Z) - From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System [49.57258257916805]
Large Language Models (LLMs) demonstrate strong zero-shot recommendation capabilities.<n>Practical applications often favor smaller, internally managed recommender models due to scalability, interpretability, and data privacy constraints.<n>We propose an active data augmentation framework that synthesizes conversational training data by leveraging black-box LLMs guided by active learning techniques.
arXiv Detail & Related papers (2025-04-21T23:05:47Z) - SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders [6.910185679055651]
This paper introduces a novel SynthTRIPs framework for generating synthetic travel queries using Large Language Models (LLMs)<n>Our approach combines persona-based preferences (e.g., budget, travel style) with explicit sustainability filters (e.g., walkability, air quality) to produce realistic and diverse queries.<n>While our framework was developed and tested for personalized city trip recommendations, the methodology applies to other recommender system domains.
arXiv Detail & Related papers (2025-04-12T16:48:35Z) - A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection [0.0]
Large Language Models (LLMs) are prone to off-topic misuse, where users may prompt these models to perform tasks beyond their intended scope.<n>Current guardrails suffer from high false-positive rates, limited adaptability, and the impracticality of requiring real-world data that is not available in pre-production.<n>We introduce a flexible, data-free guardrail development methodology that addresses these challenges.
arXiv Detail & Related papers (2024-11-20T00:31:23Z) - Mobility-LLM: Learning Visiting Intentions and Travel Preferences from Human Mobility Data with Large Language Models [22.680033463634732]
Location-based services (LBS) have accumulated extensive human mobility data on diverse behaviors through check-in sequences.
Yet, existing models analyzing check-in sequences fail to consider the semantics contained in these sequences.
We present Mobility-LLM, a novel framework that leverages large language models to analyze check-in sequences for multiple tasks.
arXiv Detail & Related papers (2024-10-29T01:58:06Z) - LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge.<n>We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z) - Large Language Models for Data Annotation and Synthesis: A Survey [49.8318827245266]
This survey focuses on the utility of Large Language Models for data annotation and synthesis.<n>It includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis.
arXiv Detail & Related papers (2024-02-21T00:44:04Z) - Adapting LLMs for Efficient, Personalized Information Retrieval: Methods
and Implications [0.7832189413179361]
Large Language Models (LLMs) excel in comprehending and generating human-like text.
This paper explores strategies for integrating Language Models (LLMs) with Information Retrieval (IR) systems.
arXiv Detail & Related papers (2023-11-21T02:01:01Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Large Language Model Augmented Narrative Driven Recommendations [51.77271767160573]
Narrative-driven recommendation (NDR) presents an information access problem where users solicit recommendations with verbose descriptions of their preferences and context.
NDR lacks abundant training data for models, and current platforms commonly do not support these requests.
We use large language models (LLMs) for data augmentation to train NDR models.
arXiv Detail & Related papers (2023-06-04T03:46:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.