Personalized LLM Decoding via Contrasting Personal Preference
- URL: http://arxiv.org/abs/2506.12109v1
- Date: Fri, 13 Jun 2025 09:12:44 GMT
- Title: Personalized LLM Decoding via Contrasting Personal Preference
- Authors: Hyungjune Bu, Chanjoo Jung, Minjae Kang, Jaehyung Kim,
- Abstract summary: We propose CoPe, a novel decoding-time approach applied after performing parameter-efficient fine-tuning (PEFT) on user-specific data.<n>Our core idea is to leverage reward-guided decoding specifically for personalization by maximizing each user's implicit reward signal.<n>Our empirical results demonstrate that CoPe achieves strong performance, improving personalization by an average of 10.57% in ROUGE-L.
- Score: 8.469329222500726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) are progressively deployed in various real-world applications, personalization of LLMs has become increasingly important. While various approaches to LLM personalization such as prompt-based and training-based methods have been actively explored, the development of effective decoding-time algorithms remains largely overlooked, despite their demonstrated potential. In this paper, we propose CoPe (Contrasting Personal Preference), a novel decoding-time approach applied after performing parameter-efficient fine-tuning (PEFT) on user-specific data. Our core idea is to leverage reward-guided decoding specifically for personalization by maximizing each user's implicit reward signal. We evaluate CoPe across five open-ended personalized text generation tasks. Our empirical results demonstrate that CoPe achieves strong performance, improving personalization by an average of 10.57% in ROUGE-L, without relying on external reward models or additional training procedures.
Related papers
- Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - Personalized Language Models via Privacy-Preserving Evolutionary Model Merging [57.161917758405465]
Personalization in large language models (LLMs) seeks to tailor models to individual user or user group preferences.<n>We propose Privacy-Preserving Model Merging via Evolutionary Algorithms (PriME)<n>PriME employs gradient-free methods to directly optimize task-specific metrics while preserving user privacy.
arXiv Detail & Related papers (2025-03-23T09:46:07Z) - Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization [68.79814761867314]
We propose Difference-aware Personalization Learning (DPL) to enhance Large Language Models (LLMs) personalization.<n>DPL strategically selects representative users for comparison and establishes a structured standard to extract task-relevant differences.<n>Experiments on real-world datasets demonstrate that DPL significantly enhances LLM personalization.
arXiv Detail & Related papers (2025-03-04T09:53:26Z) - FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users [111.56469697145519]
We propose Few-Shot Preference Optimization, which reframes reward modeling as a meta-learning problem.<n>Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them.<n>We generate over 1M synthetic personalized preferences using publicly available LLMs.<n>We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study.
arXiv Detail & Related papers (2025-02-26T17:08:46Z) - Personality-Guided Code Generation Using Large Language Models [14.665759212676488]
We conduct an empirical study on personality-guided code generation using large language models (LLMs)<n>Our results show that personality guidance significantly enhances code generation accuracy, with improved pass rates in 23 out of 28 LLM-dataset combinations.
arXiv Detail & Related papers (2024-10-16T16:42:55Z) - PAD: Personalized Alignment of LLMs at Decoding-Time [10.347782385286582]
This paper presents a novel framework designed to align LLM outputs with diverse personalized preferences during the inference phase.<n>The Personalized Alignment at Decoding-time (PAD) framework decouples the text generation process from personalized preferences.<n>PAD not only outperforms existing training-based alignment methods in terms of aligning with diverse preferences but also shows significant generalizability to preferences unseen during training.
arXiv Detail & Related papers (2024-10-05T08:00:55Z) - Aligning LLMs with Individual Preferences via Interaction [51.72200436159636]
We train large language models (LLMs) that can ''interact to align''<n>We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures.<n>For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
arXiv Detail & Related papers (2024-10-04T17:48:29Z) - Guided Profile Generation Improves Personalization with LLMs [3.2685922749445617]
In modern commercial systems, including Recommendation, Ranking, and E-Commerce platforms, there is a trend towards incorporating Personalization context as input into Large Language Models (LLMs)
We propose Guided Profile Generation (GPG), a general method designed to generate personal profiles in natural language.
Our experimental results show that GPG improves LLM's personalization ability across different tasks, for example, it increases 37% accuracy in predicting personal preference compared to directly feeding the LLMs with raw personal context.
arXiv Detail & Related papers (2024-09-19T21:29:56Z) - Comparing Retrieval-Augmentation and Parameter-Efficient Fine-Tuning for Privacy-Preserving Personalization of Large Language Models [21.115495457454365]
This paper studies an approach to RAG that involves learning user-dependent LLM parameters through parameter-efficient fine-tuning (PEFT)<n>Our results demonstrate that, on average, both RAG- and PEFT-based personalization methods yield 14.92% and 1.07% improvements over non-personalized LLMs, respectively.
arXiv Detail & Related papers (2024-09-14T19:18:26Z) - Few-shot Personalization of LLMs with Mis-aligned Responses [40.0349773257245]
This paper proposes a new approach for a few-shot personalization of large language models (LLMs)<n>Our key idea is to learn a set of personalized prompts for each user by progressively improving the prompts using LLMs.<n>During an iterative process of prompt improvement, we incorporate the contexts of mis-aligned responses by LLMs.
arXiv Detail & Related papers (2024-06-26T18:29:12Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.