RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large
Language Models
- URL: http://arxiv.org/abs/2309.15088v1
- Date: Tue, 26 Sep 2023 17:31:57 GMT
- Title: RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large
Language Models
- Authors: Ronak Pradeep, Sahel Sharifymoghaddam, Jimmy Lin
- Abstract summary: We present RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting.
Experimental results on the TREC 2019 and 2020 Deep Learning Tracks show that we can achieve effectiveness comparable to zero-shot reranking with GPT-3.5 with a much smaller 7B parameter model, although our effectiveness remains slightly behind reranking with GPT-4.
- Score: 56.51705482912727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers have successfully applied large language models (LLMs) such as
ChatGPT to reranking in an information retrieval context, but to date, such
work has mostly been built on proprietary models hidden behind opaque API
endpoints. This approach yields experimental results that are not reproducible
and non-deterministic, threatening the veracity of outcomes that build on such
shaky foundations. To address this significant shortcoming, we present
RankVicuna, the first fully open-source LLM capable of performing high-quality
listwise reranking in a zero-shot setting. Experimental results on the TREC
2019 and 2020 Deep Learning Tracks show that we can achieve effectiveness
comparable to zero-shot reranking with GPT-3.5 with a much smaller 7B parameter
model, although our effectiveness remains slightly behind reranking with GPT-4.
We hope our work provides the foundation for future research on reranking with
modern LLMs. All the code necessary to reproduce our results is available at
https://github.com/castorini/rank_llm.
Related papers
- Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models [5.0490573482829335]
Large Language Models (LLMs) have been revolutionizing a myriad of natural language processing tasks with their diverse zero-shot capabilities.
This paper investigates the use of a pre-filtering step before passage re-ranking in information retrieval (IR)
Our experiments show that this pre-filtering then allows the LLM to perform significantly better at the re-ranking task.
arXiv Detail & Related papers (2024-06-26T20:12:24Z) - MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [62.02920842630234]
We show how to build small models that have GPT-4-level performance but for 400x lower cost.
We unify pre-existing datasets into a benchmark LLM-AggreFact.
Our best system MiniCheck-FT5 (770M parameters) outperforms all systems of comparable size and reaches GPT-4 accuracy.
arXiv Detail & Related papers (2024-04-16T17:59:10Z) - ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs [60.81649785463651]
We introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations.
Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases.
arXiv Detail & Related papers (2024-02-09T11:23:14Z) - Rank-without-GPT: Building GPT-Independent Listwise Rerankers on
Open-Source Large Language Models [59.52207546810294]
Listwise rerankers based on large language models (LLM) are the zero-shot state-of-art.
In this work, we build for the first time effective listwise rerankers without any form of dependency on GPT.
Our best list se reranker surpasses the listwise rerankers based on GPT-3.5 by 13% and achieves 97% effectiveness of the ones built on GPT-4.
arXiv Detail & Related papers (2023-12-05T18:57:40Z) - The GitHub Recent Bugs Dataset for Evaluating LLM-based Debugging
Applications [20.339673903885483]
Large Language Models (LLMs) have demonstrated strong natural language processing and code synthesis capabilities.
Details about LLM training data are often not made public, which has caused concern as to whether existing bug benchmarks are included.
We present the GitHub Recent Bugs dataset, which includes 76 real-world Java bugs that were gathered after the OpenAI data cut-off point.
arXiv Detail & Related papers (2023-10-20T02:37:44Z) - Is ChatGPT Good at Search? Investigating Large Language Models as
Re-Ranking Agents [56.104476412839944]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.
This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)
To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.
To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z) - Shall We Pretrain Autoregressive Language Models with Retrieval? A
Comprehensive Study [115.96080028033904]
We study a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT.
Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models.
arXiv Detail & Related papers (2023-04-13T18:04:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.