REARANK: Reasoning Re-ranking Agent via Reinforcement Learning
- URL: http://arxiv.org/abs/2505.20046v1
- Date: Mon, 26 May 2025 14:31:48 GMT
- Title: REARANK: Reasoning Re-ranking Agent via Reinforcement Learning
- Authors: Le Zhang, Bo Wang, Xipeng Qiu, Siva Reddy, Aishwarya Agrawal,
- Abstract summary: We present REARANK, a large language model (LLM)-based listwise reasoning reranking agent.<n>REARANK explicitly reasons before reranking, significantly improving both performance and interpretability.
- Score: 69.8397511935806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present REARANK, a large language model (LLM)-based listwise reasoning reranking agent. REARANK explicitly reasons before reranking, significantly improving both performance and interpretability. Leveraging reinforcement learning and data augmentation, REARANK achieves substantial improvements over baseline models across popular information retrieval benchmarks, notably requiring only 179 annotated samples. Built on top of Qwen2.5-7B, our REARANK-7B demonstrates performance comparable to GPT-4 on both in-domain and out-of-domain benchmarks and even surpasses GPT-4 on reasoning-intensive BRIGHT benchmarks. These results underscore the effectiveness of our approach and highlight how reinforcement learning can enhance LLM reasoning capabilities in reranking.
Related papers
- Checklists Are Better Than Reward Models For Aligning Language Models [99.1896531064102]
We propose "Reinforcement Learning from Checklist Feedback" (RLCF)<n>From instructions, we extract checklists and evaluate how well responses satisfy each item.<n>Using both AI judges and specialized verifier programs, we combine these scores to compute rewards for RL.
arXiv Detail & Related papers (2025-07-24T17:58:00Z) - Lessons from Training Grounded LLMs with Verifiable Rewards [24.35637263339965]
Reinforcement learning and internal reasoning can enhance grounding in large language models.<n>We show that reasoning-augmented models significantly outperform instruction-only variants.<n>A two-stage training setup, first optimizing answer and citation behavior and then refusal, further improves grounding.
arXiv Detail & Related papers (2025-06-18T14:58:13Z) - Phi-4-reasoning Technical Report [42.508165017775]
We introduce Phi-4-reasoning, a 14-billion parameter reasoning model that achieves strong performance on complex reasoning tasks.<n>We develop Phi-4-reasoning-plus, a variant enhanced through a short phase of outcome-based reinforcement learning.<n>Both models outperform significantly larger open-weight models such as DeepSeek-R1-Distill-Llama-70B model and approach the performance levels of full DeepSeek-R1 model.
arXiv Detail & Related papers (2025-04-30T05:05:09Z) - GPT Meets Graphs and KAN Splines: Testing Novel Frameworks on Multitask Fine-Tuned GPT-2 with LoRA [0.0]
We explore the potential of integrating learnable and interpretable modules--specifically Kolmogorov-Arnold Networks (KAN) and graph-based representations--within a pre-trained GPT-2 model.
arXiv Detail & Related papers (2025-03-25T19:58:25Z) - RAG-Reward: Optimizing RAG with Reward Modeling and RLHF [8.911260109659489]
Retrieval-augmented generation (RAG) enhances Large Language Models (LLMs) with relevant and up-to-date knowledge.<n>The role of reward models in reinforcement learning for optimizing RAG remains underexplored.<n>We introduce textbfRAG-Reward, a framework designed to develop reward models.
arXiv Detail & Related papers (2025-01-22T22:59:19Z) - RaCT: Ranking-aware Chain-of-Thought Optimization for LLMs [22.51924253176532]
Large language models (LLMs) have shown significant promise in text reranking tasks.<n>Traditional supervised fine-tuning approaches by ranking utilities can compromise LLMs' general-purpose abilities.<n>We propose a novel RaCT reranking algorithm that implements SFT with Chain-of-Thought prompting, followed by a ranking preference optimization.
arXiv Detail & Related papers (2024-12-18T23:24:15Z) - RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG)
We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG.
For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z) - FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates.
Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark.
Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z) - RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations.
By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z) - A Critical Evaluation of AI Feedback for Aligning Large Language Models [60.42291111149438]
We show that simple supervised fine-tuning with GPT-4 as the teacher outperforms existing RLAIF pipelines.
More generally, we find that the gains from RLAIF vary substantially across base model families, test-time evaluation protocols, and critic models.
arXiv Detail & Related papers (2024-02-19T18:53:54Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.