Related papers: Direct Preference Optimization for LLM-Enhanced Recommendation Systems

Direct Preference Optimization for LLM-Enhanced Recommendation Systems

URL: http://arxiv.org/abs/2410.05939v2
Date: Wed, 02 Apr 2025 06:22:49 GMT
Title: Direct Preference Optimization for LLM-Enhanced Recommendation Systems
Authors: Chao Sun, Yaobo Liang, Yaming Yang, Shilin Xu, Tianmeng Yang, Yunhai Tong,
Abstract summary: Large Language Models (LLMs) have exhibited remarkable performance across a wide range of domains.<n>We propose DPO4Rec, a framework that integrates DPO into LLM-enhanced recommendation systems.<n>Extensive experiments show that DPO4Rec significantly improves re-ranking performance over strong baselines.
Score: 33.54698201942643
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have exhibited remarkable performance across a wide range of domains, motivating research into their potential for recommendation systems. Early efforts have leveraged LLMs' rich knowledge and strong generalization capabilities via in-context learning, where recommendation tasks are framed as prompts. However, LLM performance in recommendation scenarios remains limited due to the mismatch between their pretraining objectives and recommendation tasks, as well as the lack of recommendation-specific data during pretraining. To address these challenges, we propose DPO4Rec, a novel framework that integrates Direct Preference Optimization (DPO) into LLM-enhanced recommendation systems. First, we prompt the LLM to infer user preferences from historical interactions, which are then used to augment traditional ID-based sequential recommendation models. Next, we train a reward model based on knowledge-augmented recommendation architectures to assess the quality of LLM-generated reasoning. Using this, we select the highest- and lowest-ranked responses from N samples to construct a dataset for LLM fine-tuning. Finally, we apply a structure alignment strategy via DPO to align the LLM's outputs with desirable recommendation behavior. Extensive experiments show that DPO4Rec significantly improves re-ranking performance over strong baselines, demonstrating enhanced instruction-following capabilities of LLMs in recommendation tasks.

Related papers

Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation [18.13513199455587]
We propose guided embedding refinement to enhance the embeddings associated with the base recommendation system. We generate guided embeddings that capture domain-relevant semantic information on interpretable attributes. The refined embedding achieves approximately $10%$ to $50%$ gains in Mean Reciprocal Rank (MRR), Recall rate, and Normalized Discounted Cumulative Gain (NDCG)
arXiv Detail & Related papers (2025-04-15T23:03:53Z)
Reason4Rec: Large Language Models for Recommendation with Deliberative User Preference Alignment [69.11529841118671]
We propose a new Deliberative Recommendation task, which incorporates explicit reasoning about user preferences as an additional alignment goal. We then introduce the Reasoning-powered Recommender framework for deliberative user preference alignment.
arXiv Detail & Related papers (2025-02-04T07:17:54Z)
Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning [57.28766250993726]
This work explores adapting to dynamic user interests without any model updates. Existing Large Language Model (LLM)-based recommenders often lose the in-context learning ability during recommendation tuning. We propose RecICL, which customizes recommendation-specific in-context learning for real-time recommendations.
arXiv Detail & Related papers (2024-10-30T15:48:36Z)
Enhancing High-order Interaction Awareness in LLM-based Recommender Model [3.7623606729515133]
This paper presents an enhanced LLM-based recommender (ELMRec) We enhance whole-word embeddings to substantially enhance LLMs' interpretation of graph-constructed interactions for recommendations. Our ELMRec outperforms state-of-the-art (SOTA) methods in both direct and sequential recommendations.
arXiv Detail & Related papers (2024-09-30T06:07:12Z)
On Softmax Direct Preference Optimization for Recommendation [50.896117978746]
We propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish preferred items from negatives. Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders.
arXiv Detail & Related papers (2024-06-13T15:16:11Z)
Finetuning Large Language Model for Personalized Ranking [12.16551080986962]
Large Language Models (LLMs) have demonstrated remarkable performance across various domains. Direct Multi-Preference Optimization (DMPO) is a framework designed to bridge the gap and enhance the alignment of LLMs for recommendation tasks.
arXiv Detail & Related papers (2024-05-25T08:36:15Z)
Improve Temporal Awareness of LLMs for Sequential Recommendation [61.723928508200196]
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data. We propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation.
arXiv Detail & Related papers (2024-05-05T00:21:26Z)
Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation [23.182787000804407]
Large Language Models (LLMs) are emerging as promising approaches to enhance session-based recommendation (SBR) We propose a Reflective Reinforcement Large Language Model (Re2LLM) for SBR, guiding LLMs to focus on specialized knowledge essential for more accurate recommendations.
arXiv Detail & Related papers (2024-03-25T05:12:18Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning [79.32236399694077]
Low-quality data in the training set are usually detrimental to instruction tuning. We propose a novel method, termed "reflection-tuning" This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data.
arXiv Detail & Related papers (2023-10-18T05:13:47Z)
LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated. We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization. The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z)
A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP) This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.