Rank1: Test-Time Compute for Reranking in Information Retrieval
- URL: http://arxiv.org/abs/2502.18418v1
- Date: Tue, 25 Feb 2025 18:14:06 GMT
- Title: Rank1: Test-Time Compute for Reranking in Information Retrieval
- Authors: Orion Weller, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, Benjamin Van Durme,
- Abstract summary: Rank1 is the first reranking model trained to take advantage of test-time compute.<n>We gather and open-source a dataset of more than 600,000 examples of R1 reasoning traces from queries and passages in MS MARCO.
- Score: 45.356614696154075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Rank1, the first reranking model trained to take advantage of test-time compute. Rank1 demonstrates the applicability within retrieval of using a reasoning language model (i.e. OpenAI's o1, Deepseek's R1, etc.) for distillation in order to rapidly improve the performance of a smaller model. We gather and open-source a dataset of more than 600,000 examples of R1 reasoning traces from queries and passages in MS MARCO. Models trained on this dataset show: (1) state-of-the-art performance on advanced reasoning and instruction following datasets; (2) work remarkably well out of distribution due to the ability to respond to user-input prompts; and (3) have explainable reasoning chains that can be given to users or RAG-based systems. Further, we demonstrate that quantized versions of these models retain strong performance while using less compute/memory. Overall, Rank1 shows that test-time compute allows for a fundamentally new type of explainable and performant reranker model for search.
Related papers
- Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning [76.50690734636477]
We introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task.
Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries.
arXiv Detail & Related papers (2025-03-08T03:14:26Z) - R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model [70.77691645678804]
We present the first successful replication of emergent characteristics for multimodal reasoning on only a non-SFT 2B model.
Our model achieves 59.47% accuracy on CVBench, outperforming the base model by approximately 30% and exceeding both SFT setting by 2%.
In addition, we share our failed attempts and insights in attempting to achieve R1-like reasoning using RL with instruct models.
arXiv Detail & Related papers (2025-03-07T04:21:47Z) - Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification [35.347715518778095]
We study the scaling trends governing sampling-based search.<n>We find that simply scaling up a minimalist implementation of sampling-based search provides a practical inference method.<n>We identify two useful principles for improving self-verification capabilities with test-time compute.
arXiv Detail & Related papers (2025-02-03T21:31:07Z) - s1: Simple test-time scaling [148.4204982041058]
Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance.<n>We seek the simplest approach to achieve test-time scaling and strong reasoning performance.
arXiv Detail & Related papers (2025-01-31T18:48:08Z) - Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [52.34735382627312]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
Existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling.
We present T1 to scale reinforcement learning by encouraging exploration and understand inference scaling.
arXiv Detail & Related papers (2025-01-20T18:33:33Z) - A Comparative Study on Reasoning Patterns of OpenAI's o1 Model [69.08287909042421]
We show that OpenAI's o1 model has achieved the best performance on most datasets.
We also provide a detailed analysis on several reasoning benchmarks.
arXiv Detail & Related papers (2024-10-17T15:09:03Z) - Improving Passage Retrieval with Zero-Shot Question Generation [109.11542468380331]
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering.
The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage.
arXiv Detail & Related papers (2022-04-15T14:51:41Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z) - Learning Dense Representations of Phrases at Scale [22.792942611601347]
We show for the first time that we can learn dense phrase representations alone that achieve much stronger performance in open-domain QA.
Our model DensePhrases improves previous phrase retrieval models by 15%-25% absolute accuracy.
Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs.
arXiv Detail & Related papers (2020-12-23T12:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.