PrLM: Learning Explicit Reasoning for Personalized RAG via Contrastive Reward Optimization
- URL: http://arxiv.org/abs/2508.07342v1
- Date: Sun, 10 Aug 2025 13:37:26 GMT
- Title: PrLM: Learning Explicit Reasoning for Personalized RAG via Contrastive Reward Optimization
- Authors: Kepu Zhang, Teng Shi, Weijie Yu, Jun Xu,
- Abstract summary: We propose PrLM, a reinforcement learning framework that trains LLMs to explicitly reason over retrieved user profiles.<n>PrLM effectively learns from user responses without requiring annotated reasoning paths.<n>Experiments on three personalized text generation datasets show that PrLM outperforms existing methods.
- Score: 4.624026598342624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized retrieval-augmented generation (RAG) aims to produce user-tailored responses by incorporating retrieved user profiles alongside the input query. Existing methods primarily focus on improving retrieval and rely on large language models (LLMs) to implicitly integrate the retrieved context with the query. However, such models are often sensitive to retrieval quality and may generate responses that are misaligned with user preferences. To address this limitation, we propose PrLM, a reinforcement learning framework that trains LLMs to explicitly reason over retrieved user profiles. Guided by a contrastively trained personalization reward model, PrLM effectively learns from user responses without requiring annotated reasoning paths. Experiments on three personalized text generation datasets show that PrLM outperforms existing methods and remains robust across varying numbers of retrieved profiles and different retrievers.
Related papers
- Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions [50.70965714314064]
Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions.<n>This work proposes RealPref, a benchmark for evaluating realistic preference-following in personalized user-LLM interactions.
arXiv Detail & Related papers (2026-03-04T15:42:43Z) - Learning to Reason for Multi-Step Retrieval of Personal Context in Personalized Question Answering [39.08300602619814]
Personalization in Question Answering (QA) requires answers that are both accurate and aligned with users' background, preferences, and historical context.<n>We propose PR2, a reinforcement learning framework that integrates reasoning and retrieval from personal context for personalization.
arXiv Detail & Related papers (2026-02-22T19:43:43Z) - Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization [27.490675380289318]
We argue that relevance serves as an unreliable proxy for utility.<n>We propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for Llm pErsonalization.<n>In contrast to a greedy selection of the most relevant records, PURPLE treats profile construction as a set generation process.
arXiv Detail & Related papers (2026-01-17T15:05:36Z) - Rethinking On-policy Optimization for Query Augmentation [49.87723664806526]
We present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks.<n>We introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which learns to generate a pseudo-document that maximizes retrieval performance.
arXiv Detail & Related papers (2025-10-20T04:16:28Z) - MADREC: A Multi-Aspect Driven LLM Agent for Explainable and Adaptive Recommendation [11.430206422495829]
Multi-Aspect Driven LLM Agent MADRec is an autonomous recommender that constructs user and item profiles by unsupervised extraction of multi-aspect information from reviews.<n>MADRec generates structured profiles via aspect-category-based summarization and applies Re-Ranking to construct high-density inputs.<n>Experiments across multiple domains show that MADRec outperforms traditional and LLM-based baselines in both precision and explainability.
arXiv Detail & Related papers (2025-10-15T10:03:29Z) - FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users [111.56469697145519]
We propose Few-Shot Preference Optimization, which reframes reward modeling as a meta-learning problem.<n>Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them.<n>We generate over 1M synthetic personalized preferences using publicly available LLMs.<n>We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study.
arXiv Detail & Related papers (2025-02-26T17:08:46Z) - PersonalLLM: Tailoring LLMs to Individual Preferences [11.717169516971856]
We present a public benchmark, PersonalLLM, focusing on adapting LLMs to provide maximal benefits for a particular user.<n>We curate open-ended prompts paired with many high-quality answers over which users would be expected to display heterogeneous latent preferences.<n>Our dataset and generated personalities offer an innovative testbed for developing personalization algorithms.
arXiv Detail & Related papers (2024-09-30T13:55:42Z) - MoRE: A Mixture of Reflectors Framework for Large Language Model-Based Sequential Recommendation [16.10791252542592]
Large language models (LLMs) have emerged as a cutting-edge approach in sequential recommendation.<n>We propose MoRE, which introduces three perspective-aware offline reflection processes to address these gaps.<n>MoRE's meta-reflector employs a self-improving strategy and a dynamic selection mechanism to adapt to evolving user preferences.
arXiv Detail & Related papers (2024-09-10T09:58:55Z) - Few-shot Personalization of LLMs with Mis-aligned Responses [40.0349773257245]
This paper proposes a new approach for a few-shot personalization of large language models (LLMs)<n>Our key idea is to learn a set of personalized prompts for each user by progressively improving the prompts using LLMs.<n>During an iterative process of prompt improvement, we incorporate the contexts of mis-aligned responses by LLMs.
arXiv Detail & Related papers (2024-06-26T18:29:12Z) - Learning to Retrieve Iteratively for In-Context Learning [56.40100968649039]
iterative retrieval is a novel framework that empowers retrievers to make iterative decisions through policy optimization.
We instantiate an iterative retriever for composing in-context learning exemplars and apply it to various semantic parsing tasks.
By adding only 4M additional parameters for state encoding, we convert an off-the-shelf dense retriever into a stateful iterative retriever.
arXiv Detail & Related papers (2024-06-20T21:07:55Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - Query Rewriting for Retrieval-Augmented Large Language Models [139.242907155883]
Large Language Models (LLMs) play powerful, black-box readers in the retrieve-then-read pipeline.
This work introduces a new framework, Rewrite-Retrieve-Read instead of the previous retrieve-then-read for the retrieval-augmented LLMs.
arXiv Detail & Related papers (2023-05-23T17:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.