Related papers: Personalize Before Retrieve: LLM-based Personalized Query Expansion for User-Centric Retrieval

Personalize Before Retrieve: LLM-based Personalized Query Expansion for User-Centric Retrieval

URL: http://arxiv.org/abs/2510.08935v1
Date: Fri, 10 Oct 2025 02:24:09 GMT
Title: Personalize Before Retrieve: LLM-based Personalized Query Expansion for User-Centric Retrieval
Authors: Yingyi Zhang, Pengyue Jia, Derong Xu, Yi Wen, Xianneng Li, Yichao Wang, Wenlin Zhang, Xiaopeng Li, Weinan Gan, Huifeng Guo, Yong Liu, Xiangyu Zhao,
Abstract summary: Personalize Before Retrieve (PBR) is a framework that incorporates user-specific signals into query expansion prior to retrieval.<n>PBR consistently outperforms strong baselines, with up to 10% gains on PersonaBench across retrievers.
Score: 34.298743064665395
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-Augmented Generation (RAG) critically depends on effective query expansion to retrieve relevant information. However, existing expansion methods adopt uniform strategies that overlook user-specific semantics, ignoring individual expression styles, preferences, and historical context. In practice, identical queries in text can express vastly different intentions across users. This representational rigidity limits the ability of current RAG systems to generalize effectively in personalized settings. Specifically, we identify two core challenges for personalization: 1) user expression styles are inherently diverse, making it difficult for standard expansions to preserve personalized intent. 2) user corpora induce heterogeneous semantic structures-varying in topical focus and lexical organization-which hinders the effective anchoring of expanded queries within the user's corpora space. To address these challenges, we propose Personalize Before Retrieve (PBR), a framework that incorporates user-specific signals into query expansion prior to retrieval. PBR consists of two components: P-PRF, which generates stylistically aligned pseudo feedback using user history for simulating user expression style, and P-Anchor, which performs graph-based structure alignment over user corpora to capture its structure. Together, they produce personalized query representations tailored for retrieval. Experiments on two personalized benchmarks show that PBR consistently outperforms strong baselines, with up to 10% gains on PersonaBench across retrievers. Our findings demonstrate the value of modeling personalization before retrieval to close the semantic gap in user-adaptive RAG systems. Our code is available at https://github.com/Zhang-Yingyi/PBR-code.

Related papers

Learning to Reason for Multi-Step Retrieval of Personal Context in Personalized Question Answering [39.08300602619814]
Personalization in Question Answering (QA) requires answers that are both accurate and aligned with users' background, preferences, and historical context.<n>We propose PR2, a reinforcement learning framework that integrates reasoning and retrieval from personal context for personalization.
arXiv Detail & Related papers (2026-02-22T19:43:43Z)
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling [66.55381105691818]
We propose P-GenRM, the first Personalized Generative Reward Model with test-time user-based scaling.<n>P-GenRM transforms preference signals into structured evaluation chains that derive adaptive personas and scoring rubrics.<n>It further clusters users into User Prototypes and introduces a dual-granularity scaling mechanism.
arXiv Detail & Related papers (2026-02-12T16:07:22Z)
Reasoning-Based Personalized Generation for Users with Sparse Data [120.94029850012045]
We introduce GraSPer, a novel framework for enhancing personalized text generation under sparse context.<n>GraSPer first augments user context by predicting items that the user would likely interact with in the future.<n>With reasoning alignment, it then generates texts for these interactions to enrich the augmented context.<n>In the end, it generates personalized outputs conditioned on both the real and synthetic histories.
arXiv Detail & Related papers (2026-01-31T01:54:23Z)
OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents [55.27061195244624]
We formalize over-personalization into three types: Irrelevance, Repetition, and Sycophancy.<n>Agents tend to retrieve and over-attend to user memories even when unnecessary.<n>Our work takes an initial step toward more controllable and appropriate personalization in memory-augmented dialogue systems.
arXiv Detail & Related papers (2026-01-20T08:27:13Z)
Personalized Reward Modeling for Text-to-Image Generation [9.780251969338044]
We present PIGReward, a personalized reward model that dynamically generates user-conditioned evaluation dimensions and assesses images through CoT reasoning.<n> PIGReward provides personalized feedback that drives user-specific prompt optimization, improving alignment between generated images and individual intent.<n>Extensive experiments demonstrate that PIGReward surpasses existing methods in both accuracy and interpretability.
arXiv Detail & Related papers (2025-11-21T12:04:24Z)
PrLM: Learning Explicit Reasoning for Personalized RAG via Contrastive Reward Optimization [4.624026598342624]
We propose PrLM, a reinforcement learning framework that trains LLMs to explicitly reason over retrieved user profiles.<n>PrLM effectively learns from user responses without requiring annotated reasoning paths.<n>Experiments on three personalized text generation datasets show that PrLM outperforms existing methods.
arXiv Detail & Related papers (2025-08-10T13:37:26Z)
LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification [78.73711446918814]
We propose a novel framework named LATex for AG-ReID, which adopts prompt-tuning strategies to leverage attribute-based text knowledge.<n>Our framework can fully leverage attribute-based text knowledge to improve AG-ReID performance.
arXiv Detail & Related papers (2025-03-31T04:47:05Z)
Personalized Graph-Based Retrieval for Large Language Models [51.7278897841697]
We propose a framework that leverages user-centric knowledge graphs to enrich personalization.<n>By directly integrating structured user knowledge into the retrieval process and augmenting prompts with user-relevant context, PGraph enhances contextual understanding and output quality.<n>We also introduce the Personalized Graph-based Benchmark for Text Generation, designed to evaluate personalized text generation tasks in real-world settings where user history is sparse or unavailable.
arXiv Detail & Related papers (2025-01-04T01:46:49Z)
Preference Adaptive and Sequential Text-to-Image Generation [24.787970969428976]
We create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets.<n>We construct user-preference and user-choice models using an EM strategy and identify varying user preference types.<n>We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user.<n>Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification
arXiv Detail & Related papers (2024-12-10T01:47:40Z)
ULMRec: User-centric Large Language Model for Sequential Recommendation [16.494996929730927]
We propose ULMRec, a framework that integrates user personalized preferences into Large Language Models.<n>Extensive experiments on two public datasets demonstrate that ULMRec significantly outperforms existing methods.
arXiv Detail & Related papers (2024-12-07T05:37:00Z)
A Neural Topical Expansion Framework for Unstructured Persona-oriented Dialogue Generation [52.743311026230714]
Persona Exploration and Exploitation (PEE) is able to extend the predefined user persona description with semantically correlated content. PEE consists of two main modules: persona exploration and persona exploitation. Our approach outperforms state-of-the-art baselines in terms of both automatic and human evaluations.
arXiv Detail & Related papers (2020-02-06T08:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.