K-order Ranking Preference Optimization for Large Language Models
- URL: http://arxiv.org/abs/2506.00441v1
- Date: Sat, 31 May 2025 07:46:42 GMT
- Title: K-order Ranking Preference Optimization for Large Language Models
- Authors: Shihao Cai, Chongming Gao, Yang Zhang, Wentao Shi, Jizhi Zhang, Keqin Bao, Qifan Wang, Fuli Feng,
- Abstract summary: We argue that optimizing top-K ranking consistency could be more appropriate for real-world applications.<n>We propose K-order Ranking Preference Optimization (KPO) by extending the DPO's Plackett-Luce model to accommodate top-K rankings.
- Score: 43.249451587412544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To adapt large language models (LLMs) to ranking tasks, existing list-wise methods, represented by list-wise Direct Preference Optimization (DPO), focus on optimizing partial-order or full-order list ranking consistency for LLMs to enhance their ranking abilities. However, we argue that optimizing top-K ranking consistency could be more appropriate for real-world applications. There are two main reasons: (1) users are typically concerned with only the top-K results, making top-K ranking more important, and (2) tail items often lack precise feedback, making top-K ranking more reliable. Based on this, we propose K-order Ranking Preference Optimization (KPO) by extending the DPO's Plackett-Luce model to accommodate top-K rankings. Additionally, recognizing that the number of important items can vary across queries, we extend KPO to dynamically determine appropriate K for different samples and introduce a curriculum learning strategy to boost training efficiency. Extensive experiments demonstrate the effectiveness of KPO, highlighting its high sample efficiency and robustness to noise. The code is available at https://github.com/Lanyu0303/KPO.
Related papers
- In-context Ranking Preference Optimization [48.36442791241395]
We propose an In-context Ranking Preference Optimization (IRPO) framework to optimize large language models (LLMs) based on ranking lists constructed during inference.<n>We show IRPO outperforms standard DPO approaches in ranking performance, highlighting its effectiveness in aligning LLMs with direct in-context ranking preferences.
arXiv Detail & Related papers (2025-04-21T23:06:12Z) - TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees [14.84379332031731]
We introduce Tree Preference Optimization (TPO), that does not sample paired preference responses from the preference tree.<n>TPO formulates the language model alignment as a Preference List Ranking problem.<n>The experimental results indicate that TPO consistently outperforms DPO across five public large language models on four datasets.
arXiv Detail & Related papers (2024-10-10T22:22:05Z) - Learning k-Determinantal Point Processes for Personalized Ranking [13.677246792673564]
We present a new optimization criterion LkP based on set probability comparison for personalized ranking.
LkP is broadly applicable, and when applied to existing recommendation models it also yields strong performance improvements.
arXiv Detail & Related papers (2024-06-23T02:24:50Z) - Lower-Left Partial AUC: An Effective and Efficient Optimization Metric
for Recommendation [52.45394284415614]
We propose a new optimization metric, Lower-Left Partial AUC (LLPAUC), which is computationally efficient like AUC but strongly correlates with Top-K ranking metrics.
LLPAUC considers only the partial area under the ROC curve in the Lower-Left corner to push the optimization focus on Top-K.
arXiv Detail & Related papers (2024-02-29T13:58:33Z) - LiPO: Listwise Preference Optimization through Learning-to-Rank [62.02782819559389]
Policy can learn more effectively from a ranked list of plausible responses given the prompt.<n>We show that LiPO-$lambda$ can outperform DPO variants and SLiC by a clear margin on several preference alignment tasks.
arXiv Detail & Related papers (2024-02-02T20:08:10Z) - Adaptive Neural Ranking Framework: Toward Maximized Business Goal for
Cascade Ranking Systems [33.46891569350896]
Cascade ranking is widely used for large-scale top-k selection problems in online advertising and recommendation systems.
Previous works on learning-to-rank usually focus on letting the model learn the complete order or top-k order.
We name this method as Adaptive Neural Ranking Framework (abbreviated as ARF)
arXiv Detail & Related papers (2023-10-16T14:43:02Z) - Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC)
AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability.
We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z) - PEAR: Personalized Re-ranking with Contextualized Transformer for
Recommendation [48.17295872384401]
We present a personalized re-ranking model (dubbed PEAR) based on contextualized transformer.
PEAR makes several major improvements over the existing methods.
We also augment the training of PEAR with a list-level classification task to assess users' satisfaction on the whole ranking list.
arXiv Detail & Related papers (2022-03-23T08:29:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.