REPLUG: Retrieval-Augmented Black-Box Language Models
- URL: http://arxiv.org/abs/2301.12652v4
- Date: Wed, 24 May 2023 05:08:07 GMT
- Title: REPLUG: Retrieval-Augmented Black-Box Language Models
- Authors: Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James,
Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
- Abstract summary: REPLUG is a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model.
We show that REPLUG significantly improves the performance of GPT-3 (175B) on language modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by 5.1%.
- Score: 101.60145719119373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce REPLUG, a retrieval-augmented language modeling framework that
treats the language model (LM) as a black box and augments it with a tuneable
retrieval model. Unlike prior retrieval-augmented LMs that train language
models with special cross attention mechanisms to encode the retrieved text,
REPLUG simply prepends retrieved documents to the input for the frozen
black-box LM. This simple design can be easily applied to any existing
retrieval and language models. Furthermore, we show that the LM can be used to
supervise the retrieval model, which can then find documents that help the LM
make better predictions. Our experiments demonstrate that REPLUG with the tuned
retriever significantly improves the performance of GPT-3 (175B) on language
modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by
5.1%.
Related papers
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Mistral-SPLADE: LLMs for better Learned Sparse Retrieval [7.652738829153342]
We propose to use decoder-only model for learning semantic keyword expansion.
We use Mistral as the backbone to develop our Learned Sparse Retriever similar to SPLADE.
Our experiments support the hypothesis that a sparse retrieval model based on decoder only large language model (LLM) surpasses the performance of existing LSR systems.
arXiv Detail & Related papers (2024-08-20T18:21:54Z) - DataComp-LM: In search of the next generation of training sets for language models [200.5293181577585]
DataComp for Language Models (DCLM) is a testbed for controlled dataset experiments with the goal of improving language models.
We provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations.
Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters.
arXiv Detail & Related papers (2024-06-17T17:42:57Z) - Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models [7.056824589733873]
Multi-modal large language models (MLLMs) are expected to support multi-turn queries of interchanging image and text modalities in production.
Current MLLMs trained with visual-question-answering datasets could suffer from degradation.
We propose a distillation-based multi-modal alignment model with fine-grained annotations on a small dataset that restores and boosts MLLM's language capability after visual instruction tuning.
arXiv Detail & Related papers (2024-02-16T18:42:08Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval [51.437420003471615]
We propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch.
RPT improves retrieval quality and subsequently perplexity across the board compared to strong baselines.
arXiv Detail & Related papers (2023-06-23T10:18:02Z) - ReWOO: Decoupling Reasoning from Observations for Efficient Augmented
Language Models [32.95155349925248]
We propose a modular paradigm ReWOO that detaches the reasoning process from external observations, thus significantly reducing token consumption.
We show that ReWOO achieves 5x token efficiency and 4% accuracy improvement on HotpotQA, a multi-step reasoning benchmark.
Our illustrative work offloads reasoning ability from 175B GPT3.5 into 7B LLaMA, demonstrating the significant potential for truly efficient and scalable ALM systems.
arXiv Detail & Related papers (2023-05-23T00:16:48Z) - Can Retriever-Augmented Language Models Reason? The Blame Game Between
the Retriever and the Language Model [33.729248437727634]
Augmenting pretrained language models with retrievers has shown promise in effectively solving common NLP problems.
We evaluate the strengths and weaknesses of popular retriever-augmented language models, namely kNN-LM, REALM, DPR + FiD, Contriever + ATLAS, and Contriever + Flan-T5.
arXiv Detail & Related papers (2022-12-18T19:27:41Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.