IRanker: Towards Ranking Foundation Model
- URL: http://arxiv.org/abs/2506.21638v1
- Date: Wed, 25 Jun 2025 17:56:06 GMT
- Title: IRanker: Towards Ranking Foundation Model
- Authors: Tao Feng, Zhigang Hua, Zijie Lei, Yan Xie, Shuang Yang, Bo Long, Jiaxuan You,
- Abstract summary: We propose to unify ranking tasks using a single ranking foundation model (FM)<n>IRanker is a ranking framework with reinforcement learning (RL) and iterative decoding.<n>We show that a single IRanker-3B achieves state-of-the-art results on several datasets.
- Score: 26.71771958251611
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ranking tasks are ubiquitous, encompassing applications such as recommendation systems, LLM routing, and item re-ranking. We propose to unify these tasks using a single ranking foundation model (FM), as it eliminates the need for designing different models for each specific ranking task. However, unlike general supervision tasks in LLMs, ranking tasks do not have clear labels for supervision, posing great challenges to developing a ranking FM. To overcome these challenges, we propose IRanker, a ranking FM framework with reinforcement learning (RL) and iterative decoding. Our insight is to decompose the complex ranking task into an iterative decoding process that eliminates the worst candidate from the candidate pool step by step, which significantly reduces the output combinatorial space and better utilizes the limited context length during RL training. We meticulously train and comprehensively evaluate an IRanker-3B model on nine datasets across three scenarios: recommendation, routing, and passage ranking. The results show that a single IRanker-3B achieves state-of-the-art results on several datasets compared to models of similar size, and even surpasses the performance of larger models on certain datasets. We further demonstrate the effectiveness of our RL design and the robustness of the iterative mechanism across different LLM sizes. Moreover, we conducted both in-domain and out-of-domain zero-shot generalization experiments, which showed that IRanker-3B achieved good generalization on in-domain ranking tasks compared to the base LLM by at least 5% improvement. Surprisingly, on out-of-domain generic LLM tasks, IRanker-3B outperformed the base model by at least 9% on GSM8K, IFEval, and MathQA. In addition, the thoughts generated by IRanker-3B during training could further enhance zero-shot LLM performance.
Related papers
- Generative Reasoning Re-ranker [24.386586034456673]
Generative Reasoning Reranker (GR2) is an end-to-end framework with a three-stage training pipeline tailored for reranking.<n> GR2 generates high-quality reasoning traces through carefully designed prompting and rejection sampling.<n>Experiments on two real-world datasets demonstrate GR2's effectiveness.
arXiv Detail & Related papers (2026-02-08T02:12:24Z) - Ranking-aware Reinforcement Learning for Ordinal Ranking [19.678002354790582]
We propose Ranking-Aware Reinforcement Learning (RARL), a novel RL framework that explicitly learns these relationships.<n>RARL features a unified objective that integrates regression and Learning-to-Rank (L2R), enabling mutual improvement between the two tasks.<n>To further enhance training, we introduce Response Mutation Operations (RMO), which inject controlled noise to improve exploration and prevent stagnation at saddle points.
arXiv Detail & Related papers (2026-01-28T13:22:42Z) - ERank: Fusing Supervised Fine-Tuning and Reinforcement Learning for Effective and Efficient Text Reranking [33.25740773392183]
ERank is a highly effective and efficient pointwise reranker built from a reasoning LLM that excels across diverse relevance scenarios.<n>We propose a novel two-stage training pipeline that begins with Supervised Fine-Tuning (SFT)<n>In this stage, we move beyond binary labels and train the model generatively to output fine grained integer scores, which significantly enhances relevance discrimination.<n>We evaluate the ERank reranker on the BRIGHT, FollowIR, TREC DL, and BEIR benchmarks, demonstrating superior effectiveness and robustness compared to existing approaches.
arXiv Detail & Related papers (2025-08-30T14:56:53Z) - ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability [83.16850534680505]
We propose an automated reasoning-intensive training data synthesis framework.<n>A self-consistency data filtering mechanism is designed to ensure the data quality.<n>Our trained reasoning-intensive reranker textbfReasonRank achieves state-of-the-art (SOTA) performance 40.6 on the BRIGHT leaderboard.
arXiv Detail & Related papers (2025-08-09T17:26:18Z) - Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards [50.21528417884747]
We introduce Omni-Thinker, a unified reinforcement learning framework that enhances large language models (LLMs) performance across diverse tasks.<n>Our approach enables consistent optimization across task types and scales RL-based training to subjective domains.<n> Experimental results across four domains reveal that curriculum learning improves performance by 5.2% over joint training and 9.1% over model merging.
arXiv Detail & Related papers (2025-07-20T01:50:16Z) - OrderChain: A General Prompting Paradigm to Improve Ordinal Understanding Ability of MLLM [28.249198952483685]
This paper presents OrderChain, a novel and general prompting paradigm that improves the ordinal understanding ability of MLLMs by specificity and commonality modeling.<n> Comprehensive experiments show that a Large Language and Vision Assistant model with our OrderChain improves baseline LLaVA significantly on diverse OR datasets.<n>To our best knowledge, our OrderChain is the first work that augments MLLMs for OR tasks, and the effectiveness is witnessed across a spectrum of OR datasets.
arXiv Detail & Related papers (2025-04-07T07:53:44Z) - CoRanking: Collaborative Ranking with Small and Large Ranking Agents [94.09834629572403]
Large Language Models (LLMs) have demonstrated superior listwise ranking performance.<n>CoRanking combines small and large ranking models for efficient and effective ranking.
arXiv Detail & Related papers (2025-03-30T13:00:52Z) - Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning [76.50690734636477]
We introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task.<n>Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries.
arXiv Detail & Related papers (2025-03-08T03:14:26Z) - Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding [74.31981011985681]
Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps.
We introduce LaTent Reasoning Optimization (LaTRO), a principled framework that formulates reasoning as sampling from a latent distribution.
We validate LaTRO through experiments on GSM8K and ARC-Challenge datasets using multiple model architectures.
arXiv Detail & Related papers (2024-11-06T22:02:30Z) - FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates.
Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark.
Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency.
This approach fails in scenarios where the correct answers are in the minority.
We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function.
We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z) - Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents [53.78782375511531]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.<n>This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)<n>To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.<n>To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.