Updater-Extractor Architecture for Inductive World State Representations
- URL: http://arxiv.org/abs/2104.05500v1
- Date: Mon, 12 Apr 2021 14:30:11 GMT
- Title: Updater-Extractor Architecture for Inductive World State Representations
- Authors: Arseny Moskvichev, James A. Liu
- Abstract summary: We propose a transformer-based Updater-Extractor architecture and a training procedure that can work with sequences of arbitrary length.
We explicitly train the model to incorporate incoming information into its world state representation.
Empirically, we investigate the model performance on three different tasks, demonstrating its promise.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Developing NLP models traditionally involves two stages - training and
application. Retention of information acquired after training (at application
time) is architecturally limited by the size of the model's context window (in
the case of transformers), or by the practical difficulties associated with
long sequences (in the case of RNNs). In this paper, we propose a novel
transformer-based Updater-Extractor architecture and a training procedure that
can work with sequences of arbitrary length and refine its knowledge about the
world based on linguistic inputs. We explicitly train the model to incorporate
incoming information into its world state representation, obtaining strong
inductive generalization and the ability to handle extremely long-range
dependencies. We prove a lemma that provides a theoretical basis for our
approach. The result also provides insight into success and failure modes of
models trained with variants of Truncated Back-Propagation Through Time (such
as Transformer XL). Empirically, we investigate the model performance on three
different tasks, demonstrating its promise. This preprint is still a work in
progress. At present, we focused on easily interpretable tasks, leaving the
application of the proposed ideas to practical NLP applications for the future.
Related papers
- When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - On Conditional and Compositional Language Model Differentiable Prompting [75.76546041094436]
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks.
We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts.
arXiv Detail & Related papers (2023-07-04T02:47:42Z) - Foundation Models for Natural Language Processing -- Pre-trained
Language Models Integrating Media [0.0]
Foundation Models are pre-trained language models for Natural Language Processing.
They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning.
This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
arXiv Detail & Related papers (2023-02-16T20:42:04Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Can Wikipedia Help Offline Reinforcement Learning? [12.12541097531412]
Fine-tuning reinforcement learning models has been challenging because of a lack of large scale off-the-shelf datasets.
Recent work has looked at tackling offline RL with improved results as result of the introduction of the Transformer architecture.
We investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks.
arXiv Detail & Related papers (2022-01-28T13:55:35Z) - Transformers: "The End of History" for NLP? [17.36054090232896]
We shed light on some important theoretical limitations of pre-trained BERT-style models.
We show that addressing these limitations can yield sizable improvements over vanilla RoBERTa and XLNet.
We offer a more general discussion on desiderata for future additions to the Transformer architecture.
arXiv Detail & Related papers (2021-04-09T08:29:42Z) - The NLP Cookbook: Modern Recipes for Transformer based Deep Learning
Architectures [0.0]
Natural Language Processing models have achieved phenomenal success in linguistic and semantic tasks.
Recent NLP architectures have utilized concepts of transfer learning, pruning, quantization, and knowledge distillation to achieve moderate model sizes.
Knowledge Retrievers have been built to extricate explicit data documents from a large corpus of databases with greater efficiency and accuracy.
arXiv Detail & Related papers (2021-03-23T22:38:20Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - A Recurrent Vision-and-Language BERT for Navigation [54.059606864535304]
We propose a recurrent BERT model that is time-aware for use in vision-and-language navigation.
Our model can replace more complex encoder-decoder models to achieve state-of-the-art results.
arXiv Detail & Related papers (2020-11-26T00:23:00Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - On the comparability of Pre-trained Language Models [0.0]
Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP.
More elaborated architectures are making better use of contextual information.
Larger corpora are used as resources for pre-training large language models in a self-supervised fashion.
Advances in parallel computing as well as in cloud computing made it possible to train these models with growing capacities in the same or even in shorter time than previously established models.
arXiv Detail & Related papers (2020-01-03T10:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.