Related papers: On Negative-aware Preference Optimization for Recommendation

On Negative-aware Preference Optimization for Recommendation

URL: http://arxiv.org/abs/2508.09653v1
Date: Wed, 13 Aug 2025 09:37:07 GMT
Title: On Negative-aware Preference Optimization for Recommendation
Authors: Chenlu Ding, Daoxuan Liu, Jiancan Wu, Xingyu Hu, Junkang Wu, Haitao Wang, Yongkang Wang, Xingxing Wang, Xiang Wang,
Abstract summary: We propose NAPO, an enhanced framework for preference optimization in LLM-based recommendation.<n> NAPO introduces two key innovations: (1) in-batch negative sharing, which expands the pool of negative samples without additional memory overhead, and (2) dynamic reward margin adjustment, which adapts model updates based on the confidence of negative samples.
Score: 10.082739500992545
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recommendation systems leverage user interaction data to suggest relevant items while filtering out irrelevant (negative) ones. The rise of large language models (LLMs) has garnered increasing attention for their potential in recommendation tasks. However, existing methods for optimizing LLM-based recommenders face challenges in effectively utilizing negative samples. Simply integrating large numbers of negative samples can improve ranking accuracy and mitigate popularity bias but often leads to increased computational overhead and memory costs. Additionally, current approaches fail to account for the varying informativeness of negative samples, leading to suboptimal optimization performance. To address these issues, we propose NAPO (\textbf{N}egative-\textbf{A}ware \textbf{P}reference \textbf{O}ptimization), an enhanced framework for preference optimization in LLM-based recommendation. NAPO introduces two key innovations: (1) in-batch negative sharing, which expands the pool of negative samples without additional memory overhead, and (2) dynamic reward margin adjustment, which adapts model updates based on the confidence of negative samples. Extensive experiments on three public datasets demonstrate that NAPO outperforms existing methods in both recommendation accuracy and popularity bias reduction.

Related papers

Improving LLM-based Recommendation with Self-Hard Negatives from Intermediate Layers [80.55429742713623]
ILRec is a novel preference fine-tuning framework for LLM-based recommender systems.<n>We introduce a lightweight collaborative filtering model to assign token-level rewards for negative signals.<n>Experiments on three datasets demonstrate ILRec's effectiveness in enhancing the performance of LLM-based recommender systems.
arXiv Detail & Related papers (2026-02-19T14:37:43Z)
Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations [60.143658714894336]
Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation.<n> Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences.<n>We introduce Self-NPO, a Negative Preference Optimization approach that learns exclusively from the model itself.
arXiv Detail & Related papers (2025-05-17T01:03:46Z)
Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations.<n>In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%.<n> DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z)
SPRec: Self-Play to Debias LLM-based Recommendation [23.875509546540904]
Large language models (LLMs) have attracted significant attention in recommendation systems.<n>We propose SPRec, a novel self-play framework designed to mitigate over-recommendation and improve fairness without requiring additional data or manual intervention.
arXiv Detail & Related papers (2024-12-12T12:53:30Z)
Multi-Preference Optimization: Generalizing DPO via Set-Level Contrasts [17.243429150450886]
We propose $textbfMulti-Preference Optimization (MPO) to optimize over entire sets of responses.<n>MPO employs deviation-based weighting, which emphasizes outlier responses that deviate most from the mean reward.<n>We theoretically prove that MPO reduces alignment bias at a rate of $mathcalOleft(frac1sqrtnright)$ with respect to the number of responses per query.
arXiv Detail & Related papers (2024-12-05T21:50:22Z)
Preference Diffusion for Recommendation [50.8692409346126]
We propose PreferDiff, a tailored optimization objective for DM-based recommenders.<n> PreferDiff transforms BPR into a log-likelihood ranking objective to better capture user preferences.<n>It is the first personalized ranking loss designed specifically for DM-based recommenders.
arXiv Detail & Related papers (2024-10-17T01:02:04Z)
RosePO: Aligning LLM-based Recommenders with Human Values [38.029251417802044]
We propose a general framework -- Recommendation with smoothing personalized Preference Optimization (RosePO) RosePO better aligns with customized human values during the post-training stage. Evaluation on three real-world datasets demonstrates the effectiveness of our method.
arXiv Detail & Related papers (2024-10-16T12:54:34Z)
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning [45.64632177923583]
This work studies the problem of large language model (LLM) unlearning, aiming to remove unwanted data influences.<n>Despite the increasing demand for unlearning, a technically-grounded optimization framework is lacking.<n>We propose a simple yet effective unlearning optimization framework, called SimNPO, showing that simplicity' in removing the reliance on a reference model benefits unlearning.
arXiv Detail & Related papers (2024-10-09T17:58:12Z)
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization [9.618391485742968]
Iterative preference optimization has recently become one of the de-facto training paradigms for large language models (LLMs) We present an uncertainty-enhanced textbfPreference textbfOptimization framework to make the LLM self-evolve with reliable feedback. Our framework substantially alleviates the noisy problem and improves the performance of iterative preference optimization.
arXiv Detail & Related papers (2024-09-17T14:05:58Z)
On Softmax Direct Preference Optimization for Recommendation [50.896117978746]
We propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish preferred items from negatives. Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders.
arXiv Detail & Related papers (2024-06-13T15:16:11Z)
Generating Negative Samples for Sequential Recommendation [83.60655196391855]
We propose to Generate Negative Samples (items) for Sequential Recommendation (SR) A negative item is sampled at each time step based on the current SR model's learned user preferences toward items. Experiments on four public datasets verify the importance of providing high-quality negative samples for SR.
arXiv Detail & Related papers (2022-08-07T05:44:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.