Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders
- URL: http://arxiv.org/abs/2601.05588v1
- Date: Fri, 09 Jan 2026 07:16:28 GMT
- Title: Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders
- Authors: Benjamin Rozonoyer, Chong You, Michael Boratko, Himanshu Jain, Nilesh Gupta, Srinadh Bhojanapalli, Andrew McCallum, Felix Yu,
- Abstract summary: We show that pointwise generative ranking with multi-token docIDs is superior to that of dual encoders.<n>We propose SToICaL - a Simple Token-Item Calibrated Loss - which can incorporate rank-aware supervision at both the item and token levels.
- Score: 37.16464474575651
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dual and cross encoders have long been mainstays of information retrieval (IR), but are being challenged by the emergent capabilities of LLMs. An LLM-based approach we term pointwise generative ranking - generating tokens the length of a single docID as opposed to a list in order to enable ranking via beam search - combines efficiency and expressivity benefits while leveraging the in-context capabilities of Causal Transformers. Although there is ample evidence to suggest that pretrained LLMs are well-suited for ranking, we find that the vast majority of LLM-based approaches rely on next-token prediction, a loss function which is fundamentally rank-agnostic (and especially so with pointwise supervision). In this paper, we first prove that the expressivity of pointwise generative ranking with multi-token docIDs is superior to that of dual encoders. We then propose SToICaL - a Simple Token-Item Calibrated Loss - which can incorporate rank-aware supervision at both the item and token levels within the pointwise setup. We run a suite of experiments on ranking tasks derived from WordNet (Fellbaum, 1998) and ESCI (Reddy et al., arXiv:2206.06588). Two variants of SToICaL successfully suppress the probability of invalid docID generations and improve on common ranking metrics beyond top-1 retrieval.
Related papers
- LLM-guided Hierarchical Retrieval [54.73080745446999]
LATTICE is a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity.<n>A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy.<n>Our framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark.
arXiv Detail & Related papers (2025-10-15T07:05:17Z) - Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization [7.7899746437628385]
We present Rank Anything First (RAF), a two-stage token optimization method.<n>RAF crafts concise textual perturbations to consistently promote a target item in large language models.<n>RAF generates ranking-promoting prompts token-by-token, guided by dual objectives: maximizing ranking effectiveness and preserving linguistic naturalness.
arXiv Detail & Related papers (2025-10-08T07:40:40Z) - Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales [3.4068099825211986]
Two most common prompts to elicit relevance judgments are pointwise scoring and listwise ranking.<n>The current research community consensus is that listwise ranking yields superior performance.<n>In tension with this hypothesis, we find that the gap between pointwise scoring and listwise ranking shrinks when pointwise scoring is implemented using a sufficiently large ordinal relevance label space.
arXiv Detail & Related papers (2025-05-25T21:41:35Z) - Reinforcement Speculative Decoding for Fast Ranking [9.584558586988953]
Large Language Models (LLMs) have been widely adopted in ranking systems such as information retrieval (IR) systems and recommender systems (RSs)<n>We propose a Reinforcementive Decoding method for fast ranking inference of LLMs.
arXiv Detail & Related papers (2025-05-23T02:25:26Z) - Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions [60.43398881149664]
We introduce LOS-Net, a lightweight attention-based architecture trained on an efficient encoding of the LLM Output Signature.<n>It achieves superior performance across diverse benchmarks and LLMs, while maintaining extremely low detection latency.
arXiv Detail & Related papers (2025-03-18T09:04:37Z) - Order-agnostic Identifier for Large Language Model-based Generative Recommendation [94.37662915542603]
Items are assigned identifiers for Large Language Models (LLMs) to encode user history and generate the next item.<n>Existing approaches leverage either token-sequence identifiers, representing items as discrete token sequences, or single-token identifiers, using ID or semantic embeddings.<n>We propose SETRec, which leverages semantic tokenizers to obtain order-agnostic multi-dimensional tokens.
arXiv Detail & Related papers (2025-02-15T15:25:38Z) - Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models [40.21540137079309]
Long-context Language Models (LLMs) enable the full ranking of all passages within a single inference.<n>We show that full ranking with long-context LLMs can deliver superior performance in the supervised fine-tuning setting.<n>We propose a new complete listwise label construction approach and a novel importance-aware learning objective for full ranking.
arXiv Detail & Related papers (2024-12-19T06:44:59Z) - FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates.
Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark.
Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.