Related papers: Rank1: Test-Time Compute for Reranking in Information Retrieval

Rank1: Test-Time Compute for Reranking in Information Retrieval

URL: http://arxiv.org/abs/2502.18418v1
Date: Tue, 25 Feb 2025 18:14:06 GMT
Title: Rank1: Test-Time Compute for Reranking in Information Retrieval
Authors: Orion Weller, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, Benjamin Van Durme,
Abstract summary: Rank1 is the first reranking model trained to take advantage of test-time compute.<n>We gather and open-source a dataset of more than 600,000 examples of R1 reasoning traces from queries and passages in MS MARCO.
Score: 45.356614696154075
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Rank1, the first reranking model trained to take advantage of test-time compute. Rank1 demonstrates the applicability within retrieval of using a reasoning language model (i.e. OpenAI's o1, Deepseek's R1, etc.) for distillation in order to rapidly improve the performance of a smaller model. We gather and open-source a dataset of more than 600,000 examples of R1 reasoning traces from queries and passages in MS MARCO. Models trained on this dataset show: (1) state-of-the-art performance on advanced reasoning and instruction following datasets; (2) work remarkably well out of distribution due to the ability to respond to user-input prompts; and (3) have explainable reasoning chains that can be given to users or RAG-based systems. Further, we demonstrate that quantized versions of these models retain strong performance while using less compute/memory. Overall, Rank1 shows that test-time compute allows for a fundamentally new type of explainable and performant reranker model for search.

Related papers

Table-R1: Inference-Time Scaling for Table Reasoning [25.481170375825812]
We develop and evaluate two post-training strategies to enable inference-time scaling.<n>For distillation, we introduce a large-scale dataset of reasoning traces generated by DeepSeek-R1.<n>For RLVR, we propose task-specific verifiable reward functions and apply the GRPO algorithm to obtain the Table-R1-Zero model.
arXiv Detail & Related papers (2025-05-29T16:28:50Z)
Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning [76.50690734636477]
We introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task. Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries.
arXiv Detail & Related papers (2025-03-08T03:14:26Z)
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model [70.77691645678804]
We present the first successful replication of emergent characteristics for multimodal reasoning on only a non-SFT 2B model. Our model achieves 59.47% accuracy on CVBench, outperforming the base model by approximately 30% and exceeding both SFT setting by 2%. In addition, we share our failed attempts and insights in attempting to achieve R1-like reasoning using RL with instruct models.
arXiv Detail & Related papers (2025-03-07T04:21:47Z)
Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification [35.347715518778095]
We study the scaling trends governing sampling-based search.<n>We find that simply scaling up a minimalist implementation of sampling-based search provides a practical inference method.<n>We identify two useful principles for improving self-verification capabilities with test-time compute.
arXiv Detail & Related papers (2025-02-03T21:31:07Z)
s1: Simple test-time scaling [148.4204982041058]
Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance.<n>We seek the simplest approach to achieve test-time scaling and strong reasoning performance.
arXiv Detail & Related papers (2025-01-31T18:48:08Z)
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [52.34735382627312]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. Existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling. We present T1 to scale reinforcement learning by encouraging exploration and understand inference scaling.
arXiv Detail & Related papers (2025-01-20T18:33:33Z)
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference.<n>This paper presents the first comprehensive study on the prevalent issue of overthinking in these models.<n>We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z)
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model [69.08287909042421]
We show that OpenAI's o1 model has achieved the best performance on most datasets. We also provide a detailed analysis on several reasoning benchmarks.
arXiv Detail & Related papers (2024-10-17T15:09:03Z)
Improving Passage Retrieval with Zero-Shot Question Generation [109.11542468380331]
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage.
arXiv Detail & Related papers (2022-04-15T14:51:41Z)
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches. We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation. Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z)
Learning Dense Representations of Phrases at Scale [22.792942611601347]
We show for the first time that we can learn dense phrase representations alone that achieve much stronger performance in open-domain QA. Our model DensePhrases improves previous phrase retrieval models by 15%-25% absolute accuracy. Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs.
arXiv Detail & Related papers (2020-12-23T12:28:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.