Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models
- URL: http://arxiv.org/abs/2506.11452v1
- Date: Fri, 13 Jun 2025 04:03:09 GMT
- Title: Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models
- Authors: Jieran Li, Xiuyuan Hu, Yang Zhao, Shengyao Zhuang, Hao Zhang,
- Abstract summary: RefRank is a simple and effective comparative ranking method based on a fixed reference document.<n>We show that RefRank significantly outperforms Pointwise baselines and could achieve performance at least on par with Pairwise approaches.
- Score: 16.721450557704767
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated exceptional performance in the task of text ranking for information retrieval. While Pointwise ranking approaches offer computational efficiency by scoring documents independently, they often yield biased relevance estimates due to the lack of inter-document comparisons. In contrast, Pairwise methods improve ranking accuracy by explicitly comparing document pairs, but suffer from substantial computational overhead with quadratic complexity ($O(n^2)$). To address this tradeoff, we propose \textbf{RefRank}, a simple and effective comparative ranking method based on a fixed reference document. Instead of comparing all document pairs, RefRank prompts the LLM to evaluate each candidate relative to a shared reference anchor. By selecting the reference anchor that encapsulates the core query intent, RefRank implicitly captures relevance cues, enabling indirect comparison between documents via this common anchor. This reduces computational cost to linear time ($O(n)$) while importantly, preserving the advantages of comparative evaluation. To further enhance robustness, we aggregate multiple RefRank outputs using a weighted averaging scheme across different reference choices. Experiments on several benchmark datasets and with various LLMs show that RefRank significantly outperforms Pointwise baselines and could achieve performance at least on par with Pairwise approaches with a significantly lower computational cost.
Related papers
- Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales [3.4068099825211986]
Two most common prompts to elicit relevance judgments are pointwise scoring and listwise ranking.<n>The current research community consensus is that listwise ranking yields superior performance.<n>In tension with this hypothesis, we find that the gap between pointwise scoring and listwise ranking shrinks when pointwise scoring is implemented using a sufficiently large ordinal relevance label space.
arXiv Detail & Related papers (2025-05-25T21:41:35Z) - Using tournaments to calculate AUROC for zero-shot classification with LLMs [4.270472870948892]
Large language models perform surprisingly well on many zero-shot classification tasks.<n>We propose and evaluate a method that converts binary classification tasks into pairwise comparison tasks.<n>Repeated pairwise comparisons can be used to score instances using the Elo rating system.
arXiv Detail & Related papers (2025-02-20T20:13:20Z) - Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach.
This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets.
We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z) - JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Top-Down Partitioning for Efficient List-Wise Ranking [24.600506147325717]
We propose a novel algorithm that partitions a ranking to depth k and processes documents top-down.
Our algorithm is inherently parallelizable due to the use of a pivot element, which can be compared to documents down to an arbitrary depth concurrently.
arXiv Detail & Related papers (2024-05-23T14:00:26Z) - Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons [10.94304714004328]
This paper introduces a Product of Expert (PoE) framework for efficient Comparative Assessment.
Individual comparisons are considered experts that provide information on a pair's score difference.
PoE framework combines the information from these experts to yield an expression that can be maximized with respect to the underlying set of candidates.
arXiv Detail & Related papers (2024-05-09T16:45:27Z) - Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References [123.39034752499076]
Div-Ref is a method to enhance evaluation benchmarks by enriching the number of references.
We conduct experiments to empirically demonstrate that diversifying the expression of reference can significantly enhance the correlation between automatic evaluation and human evaluation.
arXiv Detail & Related papers (2023-05-24T11:53:29Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z) - CODER: An efficient framework for improving retrieval through
COntextualized Document Embedding Reranking [11.635294568328625]
We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost.
It utilizes precomputed document representations extracted by a base dense retrieval method.
It incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method.
arXiv Detail & Related papers (2021-12-16T10:25:26Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - A Training-free and Reference-free Summarization Evaluation Metric via
Centrality-weighted Relevance and Self-referenced Redundancy [60.419107377879925]
We propose a training-free and reference-free summarization evaluation metric.
Our metric consists of a centrality-weighted relevance score and a self-referenced redundancy score.
Our methods can significantly outperform existing methods on both multi-document and single-document summarization evaluation.
arXiv Detail & Related papers (2021-06-26T05:11:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.