Shall We Pretrain Autoregressive Language Models with Retrieval? A
Comprehensive Study
- URL: http://arxiv.org/abs/2304.06762v3
- Date: Thu, 21 Dec 2023 00:18:48 GMT
- Title: Shall We Pretrain Autoregressive Language Models with Retrieval? A
Comprehensive Study
- Authors: Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad
Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar,
Bryan Catanzaro
- Abstract summary: We study a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT.
Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models.
- Score: 115.96080028033904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large decoder-only language models (LMs) can be largely improved in terms of
perplexity by retrieval (e.g., RETRO), but its impact on text generation
quality and downstream task accuracy is unclear. Thus, it is still an open
question: shall we pretrain large autoregressive LMs with retrieval? To answer
it, we perform a comprehensive study on a scalable pre-trained
retrieval-augmented LM (i.e., RETRO) compared with standard GPT and
retrieval-augmented GPT incorporated at fine-tuning or inference stages. We
first provide the recipe to reproduce RETRO up to 9.5B parameters while
retrieving a text corpus with 330B tokens. Based on that, we have the following
novel findings: i) RETRO outperforms GPT on text generation with much less
degeneration (i.e., repetition), moderately higher factual accuracy, and
slightly lower toxicity with a nontoxic retrieval database. ii) On the LM
Evaluation Harness benchmark, RETRO largely outperforms GPT on
knowledge-intensive tasks, but is on par with GPT on other tasks. Furthermore,
we introduce a simple variant of the model, RETRO++, which largely improves
open-domain QA results of original RETRO (e.g., EM score +8.6 on Natural
Question) and significantly outperforms retrieval-augmented GPT in both
fine-tuning and zero-shot evaluation settings. Our findings highlight the
promising direction of pretraining autoregressive LMs with retrieval as future
foundation models. We release our code and model at:
https://github.com/NVIDIA/Megatron-LM/blob/main/tools/retro/README.md
Related papers
- MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models [34.39053202801489]
In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts.
We propose a novel query rewriting method MaFeRw, which improves RAG performance by integrating multi-aspect feedback from both the retrieval process and generated results.
Experimental results on two conversational RAG datasets demonstrate that MaFeRw achieves superior generation metrics and more stable training compared to baselines.
arXiv Detail & Related papers (2024-08-30T07:57:30Z) - RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations.
By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z) - Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems [22.142588104314175]
We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs)
We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore.
We design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries.
arXiv Detail & Related papers (2024-02-27T19:08:05Z) - Sequencing Matters: A Generate-Retrieve-Generate Model for Building
Conversational Agents [9.191944519634111]
The Georgetown InfoSense group has done in regard to solving the challenges presented by TREC iKAT 2023.
Our submitted runs outperform the median runs by a significant margin, exhibiting superior performance in nDCG across various cut numbers and in overall success rate.
Our solution involves the use of Large Language Models (LLMs) for initial answers, answer grounding by BM25, passage quality filtering by logistic regression, and answer generation by LLMs again.
arXiv Detail & Related papers (2023-11-16T02:37:58Z) - GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval [16.369071865207808]
We propose a novel GAR-meets-RAG recurrence formulation that overcomes the challenges of existing paradigms.
A key design principle is that the rewrite-retrieval stages improve the recall of the system and a final re-ranking stage improves the precision.
Our method establishes a new state-of-the-art in the BEIR benchmark, outperforming previous best results in Recall@100 and nDCG@10 metrics on 6 out of 8 datasets.
arXiv Detail & Related papers (2023-10-31T03:52:08Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large
Language Models [56.51705482912727]
We present RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting.
Experimental results on the TREC 2019 and 2020 Deep Learning Tracks show that we can achieve effectiveness comparable to zero-shot reranking with GPT-3.5 with a much smaller 7B parameter model, although our effectiveness remains slightly behind reranking with GPT-4.
arXiv Detail & Related papers (2023-09-26T17:31:57Z) - Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval [51.437420003471615]
We propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch.
RPT improves retrieval quality and subsequently perplexity across the board compared to strong baselines.
arXiv Detail & Related papers (2023-06-23T10:18:02Z) - Improving language models by retrieving from trillions of tokens [50.42630445476544]
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus.
With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile.
arXiv Detail & Related papers (2021-12-08T17:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.