Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language
Models
- URL: http://arxiv.org/abs/2210.16433v3
- Date: Mon, 27 Mar 2023 07:33:14 GMT
- Title: Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language
Models
- Authors: Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu
Chen
- Abstract summary: We develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC)
KiC empowers a parametric text-to-text language model with a knowledge-rich external memory.
As a knowledge-rich semi-parametric language model, KiC only needs a much smaller part to achieve superior zero-shot performance on unseen tasks.
- Score: 58.42146641102329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully-parametric language models generally require a huge number of model
parameters to store the necessary knowledge for solving multiple natural
language tasks in zero/few-shot settings. In addition, it is hard to adapt to
the evolving world knowledge without the costly model re-training. In this
paper, we develop a novel semi-parametric language model architecture,
Knowledge-in-Context (KiC), which empowers a parametric text-to-text language
model with a knowledge-rich external memory. Specifically, the external memory
contains six different types of knowledge: entity, dictionary, commonsense,
event, script, and causality knowledge. For each input instance, the KiC model
adaptively selects a knowledge type and retrieves the most helpful pieces of
knowledge. The input instance along with its knowledge augmentation is fed into
a text-to-text model (e.g., T5) to generate the output answer, where both the
input and the output are in natural language forms after prompting.
Interestingly, we find that KiC can be identified as a special
mixture-of-experts (MoE) model, where the knowledge selector plays the role of
a router that is used to determine the sequence-to-expert assignment in MoE.
This key observation inspires us to develop a novel algorithm for training KiC
with an instance-adaptive knowledge selector. As a knowledge-rich
semi-parametric language model, KiC only needs a much smaller parametric part
to achieve superior zero-shot performance on unseen tasks. By evaluating on 40+
different tasks, we show that KiC_Large with 770M parameters easily outperforms
large language models (LMs) that are 4-39x larger by a large margin. We also
demonstrate that KiC exhibits emergent abilities at a much smaller model scale
compared to the fully-parametric models.
Related papers
- Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws [51.68385617116854]
Scaling laws describe the relationship between the size of language models and their capabilities.
We focus on factual knowledge represented as domains, such as (USA, capital, Washington D.C.) from a Wikipedia page.
A 7B model can store 14B bits of knowledge, surpassing the English Wikipedia and textbooks combined.
arXiv Detail & Related papers (2024-04-08T11:11:31Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Semi-Structured Chain-of-Thought: Integrating Multiple Sources of Knowledge for Improved Language Model Reasoning [10.839645156881573]
We introduce a novel semi-structured prompting approach that seamlessly integrates the model's parametric memory with unstructured knowledge from text documents and structured knowledge from knowledge graphs.
Experimental results on open-domain multi-hop question answering datasets demonstrate that our prompting method significantly surpasses existing techniques.
arXiv Detail & Related papers (2023-11-14T19:53:53Z) - Contrastive Alignment of Vision to Language Through Parameter-Efficient
Transfer Learning [60.26952378997713]
Contrastive vision-language models (e.g. CLIP) are created by updating all the parameters of a vision model and language model through contrastive training.
We show that a minimal set of parameter updates ($$7%) can achieve the same performance as full-model training.
We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training.
arXiv Detail & Related papers (2023-03-21T14:12:08Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Knowledge Efficient Deep Learning for Natural Language Processing [2.2701338128113124]
This thesis focuses on adapting classical methods to modern deep learning models and algorithms.
First, we propose a knowledge rich deep learning model (KRDL) as a unifying learning framework for incorporating prior knowledge into deep models.
Second, we apply a KRDL model to assist the machine reading models to find the correct evidence sentences that can support their decision.
arXiv Detail & Related papers (2020-08-28T23:32:33Z) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation.
We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z) - How Much Knowledge Can You Pack Into the Parameters of a Language Model? [44.81324633069311]
It has been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries.
We measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge.
arXiv Detail & Related papers (2020-02-10T18:55:58Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.