RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large
Language Models
- URL: http://arxiv.org/abs/2309.15088v1
- Date: Tue, 26 Sep 2023 17:31:57 GMT
- Title: RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large
Language Models
- Authors: Ronak Pradeep, Sahel Sharifymoghaddam, Jimmy Lin
- Abstract summary: We present RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting.
Experimental results on the TREC 2019 and 2020 Deep Learning Tracks show that we can achieve effectiveness comparable to zero-shot reranking with GPT-3.5 with a much smaller 7B parameter model, although our effectiveness remains slightly behind reranking with GPT-4.
- Score: 56.51705482912727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers have successfully applied large language models (LLMs) such as
ChatGPT to reranking in an information retrieval context, but to date, such
work has mostly been built on proprietary models hidden behind opaque API
endpoints. This approach yields experimental results that are not reproducible
and non-deterministic, threatening the veracity of outcomes that build on such
shaky foundations. To address this significant shortcoming, we present
RankVicuna, the first fully open-source LLM capable of performing high-quality
listwise reranking in a zero-shot setting. Experimental results on the TREC
2019 and 2020 Deep Learning Tracks show that we can achieve effectiveness
comparable to zero-shot reranking with GPT-3.5 with a much smaller 7B parameter
model, although our effectiveness remains slightly behind reranking with GPT-4.
We hope our work provides the foundation for future research on reranking with
modern LLMs. All the code necessary to reproduce our results is available at
https://github.com/castorini/rank_llm.
Related papers
- See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses [51.975495361024606]
We propose a Self-Challenge evaluation framework with human-in-the-loop.
Starting from seed instances that GPT-4 fails to answer, we prompt GPT-4 to summarize error patterns that can be used to generate new instances.
We then build a benchmark, SC-G4, consisting of 1,835 instances generated by GPT-4 using these patterns, with human-annotated gold responses.
arXiv Detail & Related papers (2024-08-16T19:01:52Z) - Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models [5.0490573482829335]
Large Language Models (LLMs) have been revolutionizing a myriad of natural language processing tasks with their diverse zero-shot capabilities.
This paper investigates the use of a pre-filtering step before passage re-ranking in information retrieval (IR)
Our experiments show that this pre-filtering then allows the LLM to perform significantly better at the re-ranking task.
arXiv Detail & Related papers (2024-06-26T20:12:24Z) - MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [62.02920842630234]
We show how to build small fact-checking models that have GPT-4-level performance but for 400x lower cost.
We do this by constructing synthetic training data with GPT-4, which involves creating realistic yet challenging instances of factual errors.
For evaluation, we unify datasets from recent work on fact-checking and grounding LLM generations into a new benchmark, LLM-AggreFact.
arXiv Detail & Related papers (2024-04-16T17:59:10Z) - Rank-without-GPT: Building GPT-Independent Listwise Rerankers on
Open-Source Large Language Models [59.52207546810294]
Listwise rerankers based on large language models (LLM) are the zero-shot state-of-art.
In this work, we build for the first time effective listwise rerankers without any form of dependency on GPT.
Our best list se reranker surpasses the listwise rerankers based on GPT-3.5 by 13% and achieves 97% effectiveness of the ones built on GPT-4.
arXiv Detail & Related papers (2023-12-05T18:57:40Z) - The GitHub Recent Bugs Dataset for Evaluating LLM-based Debugging
Applications [20.339673903885483]
Large Language Models (LLMs) have demonstrated strong natural language processing and code synthesis capabilities.
Details about LLM training data are often not made public, which has caused concern as to whether existing bug benchmarks are included.
We present the GitHub Recent Bugs dataset, which includes 76 real-world Java bugs that were gathered after the OpenAI data cut-off point.
arXiv Detail & Related papers (2023-10-20T02:37:44Z) - Is ChatGPT Good at Search? Investigating Large Language Models as
Re-Ranking Agents [56.104476412839944]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.
This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)
To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.
To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z) - Shall We Pretrain Autoregressive Language Models with Retrieval? A
Comprehensive Study [115.96080028033904]
We study a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT.
Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models.
arXiv Detail & Related papers (2023-04-13T18:04:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.