Related papers: Reinforced Prompt Personalization for Recommendation with Large Language Models

Reinforced Prompt Personalization for Recommendation with Large Language Models

URL: http://arxiv.org/abs/2407.17115v2
Date: Mon, 03 Feb 2025 15:18:05 GMT
Title: Reinforced Prompt Personalization for Recommendation with Large Language Models
Authors: Wenyu Mao, Jiancan Wu, Weijian Chen, Chongming Gao, Xiang Wang, Xiangnan He,
Abstract summary: We introduce the concept of instance-wise prompting, aiming at personalizing discrete prompts for individual users.<n>To improve efficiency and quality, RPP personalizes prompts at the sentence level rather than searching in the vast vocabulary word-by-word.
Score: 24.360796133889156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Designing effective prompts can empower LLMs to understand user preferences and provide recommendations with intent comprehension and knowledge utilization capabilities. Nevertheless, recent studies predominantly concentrate on task-wise prompting, developing fixed prompt templates shared across all users in a given recommendation task (e.g., rating or ranking). Although convenient, task-wise prompting overlooks individual user differences, leading to inaccurate analysis of user interests. In this work, we introduce the concept of instance-wise prompting, aiming at personalizing discrete prompts for individual users. Toward this end, we propose Reinforced Prompt Personalization (RPP) to realize it automatically. To improve efficiency and quality, RPP personalizes prompts at the sentence level rather than searching in the vast vocabulary word-by-word. Specifically, RPP breaks down the prompt into four patterns, tailoring patterns based on multi-agent and combining them. Then the personalized prompts interact with LLMs (environment) iteratively, to boost LLMs' recommending performance (reward). In addition to RPP, to improve the scalability of action space, our proposal of RPP+ dynamically refines the selected actions with LLMs throughout the iterative process. Extensive experiments on various datasets demonstrate the superiority of RPP/RPP+ over traditional recommender models, few-shot methods, and other prompt-based methods, underscoring the significance of instance-wise prompting in LLMs for recommendation. Our code is available at https://github.com/maowenyu-11/RPP.

Related papers

UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering [39.79275025010785]
name is a benchmark designed to evaluate the effectiveness of user embeddings in prompting large language models for personalization. We conduct extensive experiments on various state-of-the-art methods for modeling user embeddings.
arXiv Detail & Related papers (2025-02-26T14:34:00Z)
A Prompting-Based Representation Learning Method for Recommendation with Large Language Models [2.1161973970603998]
We introduce the Prompting-Based Representation Learning Method for Recommendation (P4R) to boost the linguistic abilities of Large Language Models (LLMs) in Recommender Systems. In our P4R framework, we utilize the LLM prompting strategy to create personalized item profiles. In our evaluation, we compare P4R with state-of-the-art Recommender models and assess the quality of prompt-based profile generation.
arXiv Detail & Related papers (2024-09-25T07:06:14Z)
Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information [76.62949982303532]
We propose a parameter-efficient Large Language Model Bi-Tuning framework for sequential recommendation with collaborative information (Laser) In our Laser, the prefix is utilized to incorporate user-item collaborative information and adapt the LLM to the recommendation task, while the suffix converts the output embeddings of the LLM from the language space to the recommendation space for the follow-up item recommendation. M-Former is a lightweight MoE-based querying transformer that uses a set of query experts to integrate diverse user-specific collaborative information encoded by frozen ID-based sequential recommender systems.
arXiv Detail & Related papers (2024-09-03T04:55:03Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
Preference Distillation for Personalized Generative Recommendation [11.27949757550442]
We propose a PErsonAlized PrOmpt Distillation (PeaPOD) approach to distill user preferences as personalized soft prompts. Considering the complexities of user preferences in the real world, we maintain a shared set of learnable prompts that are dynamically weighted based on the user's interests. Experimental results on three real-world datasets demonstrate the effectiveness of our PeaPOD model on sequential recommendation, top-n recommendation, and explanation generation tasks.
arXiv Detail & Related papers (2024-07-06T09:58:58Z)
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization [73.7779735046424]
We show that different prompts should be adapted to different Large Language Models (LLM) to enhance their capabilities across various downstream tasks in NLP. We then propose a model-adaptive prompt (MAPO) method that optimize the original prompts for each specific LLM in downstream tasks.
arXiv Detail & Related papers (2024-07-04T18:39:59Z)
Few-shot Personalization of LLMs with Mis-aligned Responses [40.0349773257245]
This paper proposes a new approach for a few-shot personalization of large language models (LLMs) Our key idea is to learn a set of personalized prompts for each user by progressively improving the prompts using LLMs. During an iterative process of prompt improvement, we incorporate the contexts of mis-aligned responses by LLMs.
arXiv Detail & Related papers (2024-06-26T18:29:12Z)
Selective Prompting Tuning for Personalized Conversations with LLMs [31.28284591597932]
We propose textbfSelective textbfPrompt textbfTuning (SPT), which softly prompts large language models (LLMs) for personalized conversations in a selective way. SPT significantly enhances response diversity by up to 90%, along with improvements in other critical performance indicators.
arXiv Detail & Related papers (2024-06-26T09:03:52Z)
Prompt Optimization with Human Feedback [69.95991134172282]
We study the problem of prompt optimization with human feedback (POHF) We introduce our algorithm named automated POHF (APOHF) The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances.
arXiv Detail & Related papers (2024-05-27T16:49:29Z)
Efficient Prompting Methods for Large Language Models: A Survey [50.171011917404485]
Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks. This approach brings the additional computational burden of model inference and human effort to guide and control the behavior of LLMs. We present the basic concepts of prompting, review the advances for efficient prompting, and highlight future research directions.
arXiv Detail & Related papers (2024-04-01T12:19:08Z)
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema [36.65009632307124]
We propose Free-from Instruction-oriented Prompt Optimization (FIPO) to improve task performance of large language models (LLMs) FIPO uses a modular APO template that dynamically integrate the naive task instruction, optional instruction responses, and optional ground truth to produce finely optimized prompts. We validate FIPO framework across five public benchmarks and six testing models.
arXiv Detail & Related papers (2024-02-19T03:56:44Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)
Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Generation for Few-shot Learning [14.200398093260118]
Prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts. Existing continuous prompt optimization methods improve the performance by learning the ideal prompts. By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DPO$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on four open-source datasets.
arXiv Detail & Related papers (2023-08-14T16:58:50Z)
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach [83.62750225073341]
We consider recommendation as instruction following by large language models (LLMs) We first design a general instruction format for describing the preference, intention, task form and context of a user in natural language. Then we manually design 39 instruction templates and automatically generate a large amount of user-personalized instruction data.
arXiv Detail & Related papers (2023-05-11T17:39:07Z)
Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.