Related papers: RankLLM: A Python Package for Reranking with LLMs

RankLLM: A Python Package for Reranking with LLMs

URL: http://arxiv.org/abs/2505.19284v1
Date: Sun, 25 May 2025 19:29:27 GMT
Title: RankLLM: A Python Package for Reranking with LLMs
Authors: Sahel Sharifymoghaddam, Ronak Pradeep, Andre Slavescu, Ryan Nguyen, Andrew Xu, Zijian Chen, Yilin Zhang, Yidi Chen, Jasper Xian, Jimmy Lin,
Abstract summary: This paper introduces RankLLM, an open-source Python package for reranking large language models (LLMs)<n>To improve usability, RankLLM features optional integration with Pyserini for retrieval and provides integrated evaluation for multi-stage pipelines.<n>We reproduce results from RankGPT, LRL, RankVicuna, RankZephyr, and other recent models.
Score: 36.83343408896376
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The adoption of large language models (LLMs) as rerankers in multi-stage retrieval systems has gained significant traction in academia and industry. These models refine a candidate list of retrieved documents, often through carefully designed prompts, and are typically used in applications built on retrieval-augmented generation (RAG). This paper introduces RankLLM, an open-source Python package for reranking that is modular, highly configurable, and supports both proprietary and open-source LLMs in customized reranking workflows. To improve usability, RankLLM features optional integration with Pyserini for retrieval and provides integrated evaluation for multi-stage pipelines. Additionally, RankLLM includes a module for detailed analysis of input prompts and LLM responses, addressing reliability concerns with LLM APIs and non-deterministic behavior in Mixture-of-Experts (MoE) models. This paper presents the architecture of RankLLM, along with a detailed step-by-step guide and sample code. We reproduce results from RankGPT, LRL, RankVicuna, RankZephyr, and other recent models. RankLLM integrates with common inference frameworks and a wide range of LLMs. This compatibility allows for quick reproduction of reported results, helping to speed up both research and real-world applications. The complete repository is available at rankllm.ai, and the package can be installed via PyPI.

Related papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking [15.060195612587805]
We introduce a unified framework, textbfLLM4Ranking, which enables users to adopt different ranking methods using open-source or closed-source API-based LLMs.<n>Our framework provides a simple and interface for document reranking with LLMs, as well as easy-to-use evaluation and fine-tuning scripts for this task.
arXiv Detail & Related papers (2025-04-10T04:08:38Z)
LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences [21.777817032607405]
This paper introduces LLM-QE, a novel approach that leverages Large Language Models (LLMs) to generate document-based query expansions.<n>Experiments on the zero-shot dense retrieval model, Contriever, demonstrate the effectiveness of LLM-QE, achieving an improvement of over 8%.<n>LLM-QE also improves the training process of dense retrievers, achieving a more than 5% improvement after fine-tuning.
arXiv Detail & Related papers (2025-02-24T11:15:41Z)
Optimizing Model Selection for Compound AI Systems [76.69936664916061]
We propose an efficient framework for model selection in compound systems.<n>It iteratively selects one module and allocates to it the model with the highest module-wise performance.<n>It confers 5%-70% accuracy gains compared to using the same LLM for all modules.
arXiv Detail & Related papers (2025-02-20T18:36:25Z)
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.<n>SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.<n>We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving competitive performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z)
Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG. In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning. In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z)
RRADistill: Distilling LLMs' Passage Ranking Ability for Long-Tail Queries Document Re-Ranking on a Search Engine [2.0379810233726126]
Large Language Models (LLMs) excel at understanding the semantic relationships between queries and documents. These queries are challenging for feedback-based rankings due to sparse user engagement and limited feedback. We propose an efficient label generation pipeline and novel sLLM training methods for both encoder and decoder models.
arXiv Detail & Related papers (2024-10-08T11:28:06Z)
The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation [4.524402497958597]
This paper presents a novel methodology for generating synthetic Preference Optimization (PO) datasets using multi-models.<n>We evaluate the effectiveness and potential of these in automating and enhancing the dataset generation process.
arXiv Detail & Related papers (2024-08-16T12:01:55Z)
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable [11.894203842968745]
Parrot is a service system that focuses on the end-to-end experience of LLM-based applications. A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests.
arXiv Detail & Related papers (2024-05-30T09:46:36Z)
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion [96.47420221442397]
We introduce the PowerPoint Task Completion benchmark to assess the ability of Large Language Models to finish multi-turn, multi-modal instructions. We also propose the PPTX-Match Evaluation System that evaluates if LLMs finish the instruction based on the prediction file rather than the label API sequence. The results show that GPT-4 outperforms other LLMs with 75.1% accuracy in single-turn dialogue testing but faces challenges in completing entire sessions, achieving just 6% session accuracy.
arXiv Detail & Related papers (2023-11-03T08:06:35Z)
Large Language Models are Strong Zero-Shot Retriever [89.16756291653371]
We propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios. Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM.
arXiv Detail & Related papers (2023-04-27T14:45:55Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.