Few-shot Learning with Retrieval Augmented Language Models
- URL: http://arxiv.org/abs/2208.03299v2
- Date: Mon, 8 Aug 2022 15:01:33 GMT
- Title: Few-shot Learning with Retrieval Augmented Language Models
- Authors: Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio
Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel,
Edouard Grave
- Abstract summary: Large language models have shown impressive few-shot results on a wide range of tasks.
When knowledge is key to such results, massive parameter counts to store knowledge seem to be needed.
We present Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples.
- Score: 75.63572749426473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models have shown impressive few-shot results on a wide range
of tasks. However, when knowledge is key for such results, as is the case for
tasks such as question answering and fact checking, massive parameter counts to
store knowledge seem to be needed. Retrieval augmented models are known to
excel at knowledge intensive tasks without the need for as many parameters, but
it is unclear whether they work in few-shot settings. In this work we present
Atlas, a carefully designed and pre-trained retrieval augmented language model
able to learn knowledge intensive tasks with very few training examples. We
perform evaluations on a wide range of tasks, including MMLU, KILT and
NaturalQuestions, and study the impact of the content of the document index,
showing that it can easily be updated. Notably, Atlas reaches over 42% accuracy
on Natural Questions using only 64 examples, outperforming a 540B parameters
model by 3% despite having 50x fewer parameters.
Related papers
- A RAG-Based Institutional Assistant [0.1499944454332829]
We design and evaluate a RAG-based virtual assistant specifically tailored for the University of Sao Paulo.
Our optimal retriever model achieves a Top-5 accuracy of 30%, while our most effective generative model scores 22.04% against ground truth answers.
arXiv Detail & Related papers (2025-01-23T17:54:19Z) - KCIF: Knowledge-Conditioned Instruction Following [4.945902994386117]
We study the interaction between knowledge and instruction following, and observe that LLMs struggle to follow simple answer modifying instructions.<n>Our results highlight a limitation in the traditional separation of knowledge/reasoning and instruction following, and suggest that joint-study of these capabilities are important.
arXiv Detail & Related papers (2024-10-16T19:07:37Z) - EXnet: Efficient In-context Learning for Data-less Text classification [0.0]
We present EXnet, a model specifically designed to perform in-context learning without limitations on the number of examples.
We argue that in-context learning is an effective method to increase task accuracy, and providing examples facilitates cross-task generalization.
With extensive experiments, we show that even our smallest model (15M parameters) generalizes to several unseen classification tasks and domains.
arXiv Detail & Related papers (2023-05-24T01:40:57Z) - Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language
Models [58.42146641102329]
We develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC)
KiC empowers a parametric text-to-text language model with a knowledge-rich external memory.
As a knowledge-rich semi-parametric language model, KiC only needs a much smaller part to achieve superior zero-shot performance on unseen tasks.
arXiv Detail & Related papers (2022-10-28T23:18:43Z) - Zero-Shot Learners for Natural Language Understanding via a Unified
Multiple Choice Perspective [26.41585967095811]
Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training.
Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN.
Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification.
arXiv Detail & Related papers (2022-10-16T17:24:06Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - Knowledge-driven Data Construction for Zero-shot Evaluation in
Commonsense Question Answering [80.60605604261416]
We propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks.
We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks.
We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks.
arXiv Detail & Related papers (2020-11-07T22:52:21Z) - Meta-learning for Few-shot Natural Language Processing: A Survey [10.396506243272158]
Few-shot natural language processing (NLP) refers to NLP tasks that are accompanied with merely a handful of labeled examples.
This paper focuses on NLP domain, especially few-shot applications.
We try to provide clearer definitions, progress summary and some common datasets of applying meta-learning to few-shot NLP.
arXiv Detail & Related papers (2020-07-19T06:36:41Z) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation.
We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z) - Knowledge Guided Metric Learning for Few-Shot Text Classification [22.832467388279873]
We propose to introduce external knowledge into few-shot learning to imitate human knowledge.
Inspired by human intelligence, we propose to introduce external knowledge into few-shot learning to imitate human knowledge.
We demonstrate that our method outperforms the state-of-the-art few-shot text classification models.
arXiv Detail & Related papers (2020-04-04T10:56:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.