Related papers: How Much Knowledge Can You Pack Into the Parameters of a Language Model?

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

URL: http://arxiv.org/abs/2002.08910v4
Date: Mon, 5 Oct 2020 21:26:45 GMT
Title: How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Authors: Adam Roberts, Colin Raffel, and Noam Shazeer
Abstract summary: It has been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. We measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge.
Score: 44.81324633069311
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge. We show that this approach scales with model size and performs competitively with open-domain systems that explicitly retrieve answers from an external knowledge source when answering questions. To facilitate reproducibility and future work, we release our code and trained models at https://goo.gle/t5-cbqa.

Related papers

From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries [6.382667978271587]
Retrieval Augmented Generation (RAG) enriches the ability of language models to reason using external context to augment responses for a given user prompt. This approach has risen in popularity due to practical applications in various applications of language models in search, question/answering, and chat-bots. In this paper, we mechanistically examine the RAG pipeline to highlight that language models take shortcut and have a strong bias towards utilizing only the context information to answer the question, while relying minimally on their parametric memory.
arXiv Detail & Related papers (2024-06-18T17:46:08Z)
RECKONING: Reasoning through Dynamic Knowledge Encoding [51.076603338764706]
We show that language models can answer questions by reasoning over knowledge provided as part of the context. In these situations, the model fails to distinguish the knowledge that is necessary to answer the question. We propose teaching the model to reason more robustly by folding the provided contextual knowledge into the model's parameters.
arXiv Detail & Related papers (2023-05-10T17:54:51Z)
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models [58.42146641102329]
We develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC) KiC empowers a parametric text-to-text language model with a knowledge-rich external memory. As a knowledge-rich semi-parametric language model, KiC only needs a much smaller part to achieve superior zero-shot performance on unseen tasks.
arXiv Detail & Related papers (2022-10-28T23:18:43Z)
Automatic Short Math Answer Grading via In-context Meta-learning [2.0263791972068628]
We study the problem of automatic short answer grading for students' responses to math questions. We use MathBERT, a variant of the popular language model BERT adapted to mathematical content, as our base model. Second, we use an in-context learning approach that provides scoring examples as input to the language model.
arXiv Detail & Related papers (2022-05-30T16:26:02Z)
Zero-shot Commonsense Question Answering with Cloze Translation and Consistency Optimization [20.14487209460865]
We investigate four translation methods that can translate natural questions into cloze-style sentences. We show that our methods are complementary datasets to a knowledge base improved model, and combining them can lead to state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2022-01-01T07:12:49Z)
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [78.8500633981247]
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning" Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly.
arXiv Detail & Related papers (2021-07-28T18:09:46Z)
How Context Affects Language Models' Factual Predictions [134.29166998377187]
We integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way. We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
arXiv Detail & Related papers (2020-05-10T09:28:12Z)
Unsupervised Commonsense Question Answering with Self-Talk [71.63983121558843]
We propose an unsupervised framework based on self-talk as a novel alternative to commonsense tasks. Inspired by inquiry-based discovery learning, our approach inquires language models with a number of information seeking questions. Empirical results demonstrate that the self-talk procedure substantially improves the performance of zero-shot language model baselines.
arXiv Detail & Related papers (2020-04-11T20:43:37Z)
REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.