Related papers: Towards Two-Stage Counterfactual Learning to Rank

Towards Two-Stage Counterfactual Learning to Rank

URL: http://arxiv.org/abs/2506.20854v3
Date: Sat, 12 Jul 2025 14:36:33 GMT
Title: Towards Two-Stage Counterfactual Learning to Rank
Authors: Shashank Gupta, Yiming Liao, Maarten de Rijke,
Abstract summary: Counterfactual learning to rank aims to learn a ranking policy from user interactions.<n>In real-world applications, the candidate document set is on the order of millions, making a single-stage ranking policy impractical.<n>We propose a two-stage CLTR estimator that considers the interaction between the two stages and estimates the joint value of the two policies offline.
Score: 50.51916012823433
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual learning to rank (CLTR) aims to learn a ranking policy from user interactions while correcting for the inherent biases in interaction data, such as position bias. Existing CLTR methods assume a single ranking policy that selects top-K ranking from the entire document candidate set. In real-world applications, the candidate document set is on the order of millions, making a single-stage ranking policy impractical. In order to scale to millions of documents, real-world ranking systems are designed in a two-stage fashion, with a candidate generator followed by a ranker. The existing CLTR method for a two-stage offline ranking system only considers the top-1 ranking set-up and only focuses on training the candidate generator, with the ranker fixed. A CLTR method for training both the ranker and candidate generator jointly is missing from the existing literature. In this paper, we propose a two-stage CLTR estimator that considers the interaction between the two stages and estimates the joint value of the two policies offline. In addition, we propose a novel joint optimization method to train the candidate and ranker policies, respectively. To the best of our knowledge, we are the first to propose a CLTR estimator and learning method for two-stage ranking. Experimental results on a semi-synthetic benchmark demonstrate the effectiveness of the proposed joint CLTR method over baselines.

Related papers

Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning [76.50690734636477]
We introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task.<n>Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries.
arXiv Detail & Related papers (2025-03-08T03:14:26Z)
A Self-boosted Framework for Calibrated Ranking [7.4291851609176645]
Calibrated Ranking is a scale-calibrated ranking system that pursues accurate ranking quality and calibrated probabilistic predictions simultaneously. Previous methods need to aggregate the full candidate list within a single mini-batch to compute the ranking loss. We propose a Self-Boosted framework for Calibrated Ranking (SBCR)
arXiv Detail & Related papers (2024-06-12T09:00:49Z)
Replace Scoring with Arrangement: A Contextual Set-to-Arrangement Framework for Learning-to-Rank [40.81502990315285]
Learning-to-rank is a core technique in the top-N recommendation task, where an ideal ranker would be a mapping from an item set to an arrangement. Most existing solutions fall in the paradigm of probabilistic ranking principle (PRP), i.e., first score each item in the candidate set and then perform a sort operation to generate the top ranking list. We propose Set-To-Arrangement Ranking (STARank), a new framework directly generates the permutations of the candidate items without the need for individually scoring and sort operations.
arXiv Detail & Related papers (2023-08-05T12:22:26Z)
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting [65.00288634420812]
Pairwise Ranking Prompting (PRP) is a technique to significantly reduce the burden on Large Language Models (LLMs) Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs.
arXiv Detail & Related papers (2023-06-30T11:32:25Z)
Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data. Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z)
PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework in E-commerce [13.885695433738437]
Existing re-ranking methods directly take the initial ranking list as input, and generate the optimal permutation through a well-designed context-wise model. evaluating all candidate permutations brings unacceptable computational costs in practice. This paper presents a novel end-to-end re-ranking framework named PIER to tackle the above challenges.
arXiv Detail & Related papers (2023-02-06T09:17:52Z)
Supervised Off-Policy Ranking [145.3039527243585]
Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy. We propose supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance. Our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies.
arXiv Detail & Related papers (2021-07-03T07:01:23Z)
PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer [35.199462901346706]
We propose to estimate a pairwise learning to rank model online. In each round, candidate documents are partitioned and ranked according to the model's confidence on the estimated pairwise rank order. Regret directly defined on the number of mis-ordered pairs is proven, which connects the online solution's theoretical convergence with its expected ranking performance.
arXiv Detail & Related papers (2021-02-28T01:16:55Z)
Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z)
Hierarchical Ranking for Answer Selection [19.379777219863964]
We propose a novel strategy for answer selection, called hierarchical ranking. We introduce three levels of ranking: point-level ranking, pair-level ranking, and list-level ranking. Experimental results on two public datasets, WikiQA and TREC-QA, demonstrate that the proposed hierarchical ranking is effective.
arXiv Detail & Related papers (2021-02-01T07:35:52Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.