Decoupled Context Processing for Context Augmented Language Modeling
- URL: http://arxiv.org/abs/2210.05758v1
- Date: Tue, 11 Oct 2022 20:05:09 GMT
- Title: Decoupled Context Processing for Context Augmented Language Modeling
- Authors: Zonglin Li, Ruiqi Guo, Sanjiv Kumar
- Abstract summary: Language models can be augmented with a context retriever to incorporate knowledge from large external databases.
By leveraging retrieved context, the neural network does not have to memorize the massive amount of world knowledge within its internal parameters, leading to better efficiency, interpretability and modularity.
- Score: 33.89636308731306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models can be augmented with a context retriever to incorporate
knowledge from large external databases. By leveraging retrieved context, the
neural network does not have to memorize the massive amount of world knowledge
within its internal parameters, leading to better parameter efficiency,
interpretability and modularity. In this paper we examined a simple yet
effective architecture for incorporating external context into language models
based on decoupled Encoder Decoder architecture. We showed that such a simple
architecture achieves competitive results on auto-regressive language modeling
and open domain question answering tasks. We also analyzed the behavior of the
proposed model which performs grounded context transfer. Finally we discussed
the computational implications of such retrieval augmented models.
Related papers
- Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts [83.57864140378035]
This paper proposes a method to cover longer contexts in Open-Domain Question-Answering tasks.
It leverages a small encoder language model that effectively encodes contexts, and the encoding applies cross-attention with origin inputs.
After fine-tuning, there is improved performance across two held-in datasets, four held-out datasets, and also in two In Context Learning settings.
arXiv Detail & Related papers (2024-04-02T15:10:11Z) - Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability [39.42414275888214]
We evaluate 13 models capable of causal language modeling across a suite of synthetic in-context learning tasks.
All the considered architectures can perform in-context learning under a wider range of conditions than previously documented.
Several attention alternatives are sometimes competitive with or better in-context learners than transformers.
arXiv Detail & Related papers (2023-10-12T05:43:06Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Fine-Tune Language Models as Multi-Modal Differential Equation Solvers [14.181842691371935]
We present a transformation of in-context operator learning into a multi-modal paradigm.
In particular, we take inspiration from the recent success of large language models, and propose using "captions" to integrate human knowledge about the operator.
arXiv Detail & Related papers (2023-08-09T16:44:25Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Reimagining Retrieval Augmented Language Models for Answering Queries [23.373952699385427]
We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison.
Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions.
arXiv Detail & Related papers (2023-06-01T18:08:51Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Interactively Generating Explanations for Transformer Language Models [14.306470205426526]
Transformer language models are state-of-the-art in a multitude of NLP tasks.
Recent methods aim to provide interpretability and explainability to black-box models.
We emphasize using prototype networks directly incorporated into the model architecture.
arXiv Detail & Related papers (2021-09-02T11:34:29Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query.
A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives.
We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.