REPLUG: Retrieval-Augmented Black-Box Language Models
- URL: http://arxiv.org/abs/2301.12652v4
- Date: Wed, 24 May 2023 05:08:07 GMT
- Title: REPLUG: Retrieval-Augmented Black-Box Language Models
- Authors: Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James,
Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
- Abstract summary: REPLUG is a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model.
We show that REPLUG significantly improves the performance of GPT-3 (175B) on language modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by 5.1%.
- Score: 101.60145719119373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce REPLUG, a retrieval-augmented language modeling framework that
treats the language model (LM) as a black box and augments it with a tuneable
retrieval model. Unlike prior retrieval-augmented LMs that train language
models with special cross attention mechanisms to encode the retrieved text,
REPLUG simply prepends retrieved documents to the input for the frozen
black-box LM. This simple design can be easily applied to any existing
retrieval and language models. Furthermore, we show that the LM can be used to
supervise the retrieval model, which can then find documents that help the LM
make better predictions. Our experiments demonstrate that REPLUG with the tuned
retriever significantly improves the performance of GPT-3 (175B) on language
modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by
5.1%.
Related papers
- DataComp-LM: In search of the next generation of training sets for language models [200.5293181577585]
DataComp for Language Models (DCLM) is a testbed for controlled dataset experiments with the goal of improving language models.
We provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations.
Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters.
arXiv Detail & Related papers (2024-06-17T17:42:57Z) - Multi-modal preference alignment remedies regression of visual
instruction tuning on language model [7.9311636400991485]
We propose a distillation-based multi-modal alignment model with fine-grained annotations on a small dataset to restore language capability after visual instruction tuning.
Our findings indicate that the with DPO we are able to surpass instruction-following capabilities of the language model, achieving a 6.73 score on MT-Bench, compared to Vicuna's 6.57 and LLaVA's 5.99 despite small data scale.
arXiv Detail & Related papers (2024-02-16T18:42:08Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval [51.437420003471615]
We propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch.
RPT improves retrieval quality and subsequently perplexity across the board compared to strong baselines.
arXiv Detail & Related papers (2023-06-23T10:18:02Z) - ReWOO: Decoupling Reasoning from Observations for Efficient Augmented
Language Models [32.95155349925248]
We propose a modular paradigm ReWOO that detaches the reasoning process from external observations, thus significantly reducing token consumption.
We show that ReWOO achieves 5x token efficiency and 4% accuracy improvement on HotpotQA, a multi-step reasoning benchmark.
Our illustrative work offloads reasoning ability from 175B GPT3.5 into 7B LLaMA, demonstrating the significant potential for truly efficient and scalable ALM systems.
arXiv Detail & Related papers (2023-05-23T00:16:48Z) - Is ChatGPT Good at Search? Investigating Large Language Models as
Re-Ranking Agents [56.104476412839944]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.
This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)
To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.
To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z) - Can Retriever-Augmented Language Models Reason? The Blame Game Between
the Retriever and the Language Model [33.729248437727634]
Augmenting pretrained language models with retrievers has shown promise in effectively solving common NLP problems.
We evaluate the strengths and weaknesses of popular retriever-augmented language models, namely kNN-LM, REALM, DPR + FiD, Contriever + ATLAS, and Contriever + Flan-T5.
arXiv Detail & Related papers (2022-12-18T19:27:41Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - mGPT: Few-Shot Learners Go Multilingual [1.4354798873010843]
This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages.
We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism.
The resulting models show performance on par with the recently released XGLM models by Facebook.
arXiv Detail & Related papers (2022-04-15T13:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.