IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering
- URL: http://arxiv.org/abs/2510.23536v1
- Date: Mon, 27 Oct 2025 17:12:49 GMT
- Title: IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering
- Authors: Jieyong Kim, Maryam Amirizaniani, Soojin Yoon, Dongha Lee,
- Abstract summary: We introduce the concept of core intents: intents users prioritize when selecting answers to satisfy their information needs.<n>Since users do not explicitly state their intents, we derive core intents from observable behavior patterns in answer selection.<n>We construct a dataset with various domains through systematic filtering, LLM-based annotation, and rigorous quality control.
- Score: 13.337602043970051
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Intent identification serves as the foundation for generating appropriate responses in personalized question answering (PQA). However, existing benchmarks evaluate only response quality or retrieval performance without directly measuring intent identification capabilities. This gap is critical because without understanding which intents users prioritize, systems cannot generate responses satisfying individual information needs. To address this, we introduce the concept of core intents: intents users prioritize when selecting answers to satisfy their information needs. To evaluate these core intents, we propose IPQA, a benchmark for core Intent identification in Personalized Question Answering. Since users do not explicitly state their prioritized intents, we derive core intents from observable behavior patterns in answer selection, grounded in satisficing theory where users choose answers meeting their acceptance thresholds. We construct a dataset with various domains through systematic filtering, LLM-based annotation, and rigorous quality control combining automated verification with human validation. Experimental evaluations across state-of-the-art language models reveal that current systems struggle with core intent identification in personalized contexts. Models fail to identify core intents from user histories, with performance degrading as question complexity increases. The code and dataset will be made publicly available to facilitate future research in this direction.
Related papers
- Learning Steerable Clarification Policies with Collaborative Self-play [67.67872810596839]
To handle ambiguous queries, AI assistants need a policy for managing their uncertainty.<n>We propose to train steerable policies for managing this uncertainty using self-play.<n>We show this leads to a steerable policy that changes its behavior predictably conditioned on the provided costs.
arXiv Detail & Related papers (2025-12-03T18:49:54Z) - Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It [81.50711040539566]
Current large language model (LLM) development treats task-solving and preference alignment as separate challenges.<n>We introduce PREFDISCO, an evaluation methodology that transforms static benchmarks into interactive personalization tasks.<n>Our framework creates scenarios where identical questions require different reasoning chains depending on user context.
arXiv Detail & Related papers (2025-09-30T18:55:28Z) - BESPOKE: Benchmark for Search-Augmented Large Language Model Personalization via Diagnostic Feedback [9.980170820190093]
We propose BESPOKE, the realistic benchmark for evaluating personalization in search-augmented large language models.<n>BESPOKE is designed to be both realistic, by collecting authentic chat and search histories directly from humans.<n>We conduct systematic analyses that reveal key requirements for effective personalization in information-seeking tasks.
arXiv Detail & Related papers (2025-09-25T12:53:07Z) - Pathways of Thoughts: Multi-Directional Thinking for Long-form Personalized Question Answering [57.12316804290369]
Personalization is essential for adapting question answering systems to user-specific information needs.<n>We propose Pathways of Thoughts (PoT), an inference-stage method that applies to any large language model (LLM) without requiring task-specific fine-tuning.<n>PoT consistently outperforms competitive baselines, achieving up to a 13.1% relative improvement.
arXiv Detail & Related papers (2025-09-23T14:44:46Z) - A Generative Framework for Personalized Sticker Retrieval [73.57899194210141]
We propose PEARL, a novel generative framework for personalized sticker retrieval.<n>We make two key contributions: (i) To encode user-specific sticker preferences, we design a representation learning model to learn discriminative user representations, and (ii) To generate stickers aligned with a user's query intent, we propose a novel intent-aware learning objective.<n> Empirical results from both offline evaluations and online tests demonstrate that PEARL significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-09-22T13:11:44Z) - Exploring the Individuality and Collectivity of Intents behind Interactions for Graph Collaborative Filtering [9.740376003100437]
We propose a novel recommendation framework designated as Bilateral Intent-guided Graph Collaborative Filtering (BIGCF)
Specifically, we take a closer look at user-item interactions from a causal perspective and put forth the concepts of individual intent.
To counter the sparsity of implicit feedback, the feature distributions of users and items are encoded via a Gaussian-based graph generation strategy.
arXiv Detail & Related papers (2024-05-15T02:31:26Z) - Towards Reliable and Factual Response Generation: Detecting Unanswerable
Questions in Information-Seeking Conversations [16.99952884041096]
Generative AI models face the challenge of hallucinations that can undermine users' trust in such systems.
We approach the problem of conversational information seeking as a two-step process, where relevant passages in a corpus are identified first and then summarized into a final system response.
Specifically, our proposed method employs a sentence-level classifier to detect if the answer is present, then aggregates these predictions on the passage level, and eventually across the top-ranked passages to arrive at a final answerability estimate.
arXiv Detail & Related papers (2024-01-21T10:15:36Z) - Going beyond research datasets: Novel intent discovery in the industry
setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform.
We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision.
We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z) - ARTA: Collection and Classification of Ambiguous Requests and Thoughtful
Actions [35.557857101679296]
Human-assisting systems must take thoughtful, appropriate actions for ambiguous user requests.
We develop a model that classifies ambiguous user requests into corresponding system actions.
Experiments show that the PU learning method achieved better performance than the general positive/negative learning method.
arXiv Detail & Related papers (2021-06-15T09:28:39Z) - Interactive Question Clarification in Dialogue via Reinforcement
Learning [36.746578601398866]
We propose a reinforcement model to clarify ambiguous questions by suggesting refinements of the original query.
The model is trained using reinforcement learning with a deep policy network.
We evaluate our model based on real-world user clicks and demonstrate significant improvements.
arXiv Detail & Related papers (2020-12-17T06:38:04Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z) - IART: Intent-aware Response Ranking with Transformers in
Information-seeking Conversation Systems [80.0781718687327]
We analyze user intent patterns in information-seeking conversations and propose an intent-aware neural response ranking model "IART"
IART is built on top of the integration of user intent modeling and language representation learning with the Transformer architecture.
arXiv Detail & Related papers (2020-02-03T05:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.