Related papers: Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

URL: http://arxiv.org/abs/2310.07712v2
Date: Mon, 22 Apr 2024 17:53:48 GMT
Title: Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models
Authors: Raphael Tang, Xinyu Zhang, Xueguang Ma, Jimmy Lin, Ferhan Ture,
Abstract summary: Large language models (LLMs) exhibit positional bias in how they use context. We propose permutation self-consistency, a form of self-consistency over ranking list outputs of black-box LLMs. Our approach improves scores from conventional inference by up to 7-18% for GPT-3.5 and 8-16% for LLaMA v2 (70B)
Score: 63.714662435555674
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) exhibit positional bias in how they use context, which especially complicates listwise ranking. To address this, we propose permutation self-consistency, a form of self-consistency over ranking list outputs of black-box LLMs. Our key idea is to marginalize out different list orders in the prompt to produce an order-independent ranking with less positional bias. First, given some input prompt, we repeatedly shuffle the list in the prompt and pass it through the LLM while holding the instructions the same. Next, we aggregate the resulting sample of rankings by computing the central ranking closest in distance to all of them, marginalizing out prompt order biases in the process. Theoretically, we prove the robustness of our method, showing convergence to the true ranking in the presence of random perturbations. Empirically, on five list-ranking datasets in sorting and passage reranking, our approach improves scores from conventional inference by up to 7-18% for GPT-3.5 and 8-16% for LLaMA v2 (70B), surpassing the previous state of the art in passage reranking. Our code is at https://github.com/castorini/perm-sc.

Related papers

Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach. This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z)
FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z)
LLM-RankFusion: Mitigating Intrinsic Inconsistency in LLM-based Ranking [17.96316956366718]
Ranking passages by prompting a large language model (LLM) can achieve promising performance in modern information retrieval (IR) systems. We show that sorting-based methods require consistent comparisons to correctly sort the passages, which we show that LLMs often violate. We propose LLM-RankFusion, an LLM-based ranking framework that mitigates these inconsistencies and produces a robust ranking list.
arXiv Detail & Related papers (2024-05-31T23:29:42Z)
LiPO: Listwise Preference Optimization through Learning-to-Rank [62.02782819559389]
Policy can learn more effectively from a ranked list of plausible responses given the prompt.<n>We show that LiPO-$lambda$ can outperform DPO variants and SLiC by a clear margin on several preference alignment tasks.
arXiv Detail & Related papers (2024-02-02T20:08:10Z)
Unsupervised Contrast-Consistent Ranking with Language Models [24.696017700382665]
Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. We compare pairwise, pointwise and listwise prompting techniques to elicit a language model's ranking knowledge. We find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce.
arXiv Detail & Related papers (2023-09-13T14:36:26Z)
Replace Scoring with Arrangement: A Contextual Set-to-Arrangement Framework for Learning-to-Rank [40.81502990315285]
Learning-to-rank is a core technique in the top-N recommendation task, where an ideal ranker would be a mapping from an item set to an arrangement. Most existing solutions fall in the paradigm of probabilistic ranking principle (PRP), i.e., first score each item in the candidate set and then perform a sort operation to generate the top ranking list. We propose Set-To-Arrangement Ranking (STARank), a new framework directly generates the permutations of the candidate items without the need for individually scoring and sort operations.
arXiv Detail & Related papers (2023-08-05T12:22:26Z)
Partitioned Saliency Ranking with Dense Pyramid Transformers [4.449304130658638]
Saliency ranking has emerged as a challenging task focusing on assessing the degree of saliency at instance-level. Previous approaches undertake the saliency ranking by directly sorting the rank scores of salient instances, which have not explicitly resolved the inherent ambiguities. We propose the ranking by partition paradigm, which segments unordered salient instances into partitions and then ranks them based on the correlations among these partitions.
arXiv Detail & Related papers (2023-08-01T02:33:10Z)
Learning List-Level Domain-Invariant Representations for Ranking [59.3544317373004]
We propose list-level alignment -- learning domain-invariant representations at the higher level of lists. The benefits are twofold: it leads to the first domain adaptation generalization bound for ranking, in turn providing theoretical support for the proposed method.
arXiv Detail & Related papers (2022-12-21T04:49:55Z)
PiRank: Learning To Rank via Differentiable Sorting [85.28916333414145]
We propose PiRank, a new class of differentiable surrogates for ranking. We show that PiRank exactly recovers the desired metrics in the limit of zero temperature.
arXiv Detail & Related papers (2020-12-12T05:07:36Z)
SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons. We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.