RPM: Reasoning-Level Personalization for Black-Box Large Language Models
- URL: http://arxiv.org/abs/2505.21082v4
- Date: Wed, 15 Oct 2025 08:31:17 GMT
- Title: RPM: Reasoning-Level Personalization for Black-Box Large Language Models
- Authors: Jieyong Kim, Tongyoung Kim, Soojin Yoon, Jaehyung Kim, Dongha Lee,
- Abstract summary: This work introduces reasoning-level personalization as a new paradigm.<n> RPM is the first systematic framework designed to guide the model's reasoning process using structured rationales constructed from patterns in a user's behavior.
- Score: 13.102489006219548
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While black-box large language models are widely deployed, they produce generic outputs that overlook individual user preferences. Current personalization methods are fundamentally limited to response-level personalization; they only match final outputs, failing to model the underlying reasoning that connects user behavior to responses. To address this, this work introduces reasoning-level personalization as a new paradigm and proposes RPM, the first systematic framework designed to guide the model's reasoning process using structured rationales constructed from patterns in a user's behavior. RPM constructs a structured model of user behavior-built from response-influential features and statistical factors-to create personalized reasoning paths and retrieve beneficial examples for guiding inference through a feature-based retrieval mechanism. Extensive experiments across four diverse tasks demonstrate that RPM consistently outperforms existing response-level methods while simultaneously enhancing both personalization performance and interpretability, providing a promising direction for black-box LLM personalization.
Related papers
- Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z) - P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling [66.55381105691818]
We propose P-GenRM, the first Personalized Generative Reward Model with test-time user-based scaling.<n>P-GenRM transforms preference signals into structured evaluation chains that derive adaptive personas and scoring rubrics.<n>It further clusters users into User Prototypes and introduces a dual-granularity scaling mechanism.
arXiv Detail & Related papers (2026-02-12T16:07:22Z) - One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment [55.86333374784959]
We argue that addressing these constraints requires a paradigm shift from fitting data to learn user preferences to learn the process of preference adaptation.<n>We propose Meta Reward Modeling (MRM), which reformulates personalized reward modeling as a meta-learning problem.<n>We show that MRM enhances few-shot personalization, improves user robustness, and consistently outperforms baselines.
arXiv Detail & Related papers (2026-01-26T17:55:52Z) - Unveiling Inference Scaling for Difference-Aware User Modeling in LLM Personalization [8.34180795290891]
Difference-aware Reasoning Personalization is a framework that reconstructs the difference extraction mechanism by leveraging inference scaling to enhance personalization.<n>LLMs autonomously identifies relevant difference feature dimensions and generates structured definitions and descriptions, enabling slow, deliberate reasoning (System-2 thinking) over user differences.
arXiv Detail & Related papers (2025-11-19T12:35:40Z) - Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models [16.152962349146275]
We propose Reflective Personalization Optimization (RPO), a framework that redefines the personalization paradigm by decoupling content generation from alignment.<n>RPO operates in two distinct stages: first, a base model generates a high-quality, generic response; then, an external reflection module explicitly rewrites this output to align with the user's preferences.<n> Comprehensive experiments on the LaMP benchmark demonstrate that RPO, by decoupling content generation from personalization, significantly outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-11-07T14:48:49Z) - PrLM: Learning Explicit Reasoning for Personalized RAG via Contrastive Reward Optimization [4.624026598342624]
We propose PrLM, a reinforcement learning framework that trains LLMs to explicitly reason over retrieved user profiles.<n>PrLM effectively learns from user responses without requiring annotated reasoning paths.<n>Experiments on three personalized text generation datasets show that PrLM outperforms existing methods.
arXiv Detail & Related papers (2025-08-10T13:37:26Z) - PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment [35.68913976348608]
We introduce the Reinforcement Learning for Personalized Alignment (RLPA) framework to iteratively infer and refine user profiles through dialogue.<n>We instantiate RLPA by fine-tuning Qwen-2.5-3B-Instruct, resulting in Qwen-RLPA, which achieves state-of-the-art performance in personalized dialogue.
arXiv Detail & Related papers (2025-05-21T12:38:36Z) - A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z) - HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation [24.67727411391369]
HyPerAlign is an interpretable and sample-efficient hypothesis-driven personalization approach for large language models.<n>We conduct experiments on two different personalization tasks, namely authorship attribution and deliberative alignment.<n>Results demonstrate the superiority of hypothesis-driven personalization compared to preference-based fine-tuning methods.
arXiv Detail & Related papers (2025-04-29T18:01:46Z) - Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization [68.79814761867314]
We propose Difference-aware Personalization Learning (DPL) to enhance Large Language Models (LLMs) personalization.<n>DPL strategically selects representative users for comparison and establishes a structured standard to extract task-relevant differences.<n>Experiments on real-world datasets demonstrate that DPL significantly enhances LLM personalization.
arXiv Detail & Related papers (2025-03-04T09:53:26Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - ULMRec: User-centric Large Language Model for Sequential Recommendation [16.494996929730927]
We propose ULMRec, a framework that integrates user personalized preferences into Large Language Models.<n>Extensive experiments on two public datasets demonstrate that ULMRec significantly outperforms existing methods.
arXiv Detail & Related papers (2024-12-07T05:37:00Z) - LLMs + Persona-Plug = Personalized LLMs [41.60364110693824]
Personalization plays a critical role in numerous language tasks and applications, since users with the same requirements may prefer diverse outputs based on their individual interests.
This has led to the development of various personalized approaches aimed at adapting large language models (LLMs) to generate customized outputs aligned with user preferences.
We propose a novel personalized LLM model, ours. It constructs a user-specific embedding for each individual by modeling all her historical contexts through a lightweight plug-in user embedder module.
arXiv Detail & Related papers (2024-09-18T11:54:45Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - An LLM Feature-based Framework for Dialogue Constructiveness Assessment [8.87747076871578]
Research on dialogue constructiveness assessment focuses on (i) analysing conversational factors that influence individuals to take specific actions, win debates, change their perspectives or broaden their open-mindedness and (ii) predicting constructiveness outcomes following dialogues for such use cases.
These objectives can be achieved by training either interpretable feature-based models or neural models such as pre-trained language models.
We propose an LLM feature-based framework for dialogue constructiveness assessment that combines the strengths of feature-based and neural approaches.
arXiv Detail & Related papers (2024-06-20T22:10:52Z) - Personalized Large Language Models [1.0881867638866944]
This paper investigates methods to personalize large language models (LLMs)
Results demonstrate that personalized fine-tuning improves model reasoning compared to non-personalized models.
Experiments on datasets for emotion recognition and hate speech detection show consistent performance gains with personalized methods.
arXiv Detail & Related papers (2024-02-14T15:55:30Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - RecExplainer: Aligning Large Language Models for Explaining Recommendation Models [50.74181089742969]
Large language models (LLMs) have demonstrated remarkable intelligence in understanding, reasoning, and instruction following.
This paper presents the initial exploration of using LLMs as surrogate models to explain black-box recommender models.
To facilitate an effective alignment, we introduce three methods: behavior alignment, intention alignment, and hybrid alignment.
arXiv Detail & Related papers (2023-11-18T03:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.