Related papers: Updater-Extractor Architecture for Inductive World State Representations

Updater-Extractor Architecture for Inductive World State Representations

URL: http://arxiv.org/abs/2104.05500v1
Date: Mon, 12 Apr 2021 14:30:11 GMT
Title: Updater-Extractor Architecture for Inductive World State Representations
Authors: Arseny Moskvichev, James A. Liu
Abstract summary: We propose a transformer-based Updater-Extractor architecture and a training procedure that can work with sequences of arbitrary length. We explicitly train the model to incorporate incoming information into its world state representation. Empirically, we investigate the model performance on three different tasks, demonstrating its promise.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Developing NLP models traditionally involves two stages - training and application. Retention of information acquired after training (at application time) is architecturally limited by the size of the model's context window (in the case of transformers), or by the practical difficulties associated with long sequences (in the case of RNNs). In this paper, we propose a novel transformer-based Updater-Extractor architecture and a training procedure that can work with sequences of arbitrary length and refine its knowledge about the world based on linguistic inputs. We explicitly train the model to incorporate incoming information into its world state representation, obtaining strong inductive generalization and the ability to handle extremely long-range dependencies. We prove a lemma that provides a theoretical basis for our approach. The result also provides insight into success and failure modes of models trained with variants of Truncated Back-Propagation Through Time (such as Transformer XL). Empirically, we investigate the model performance on three different tasks, demonstrating its promise. This preprint is still a work in progress. At present, we focused on easily interpretable tasks, leaving the application of the proposed ideas to practical NLP applications for the future.

Related papers

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling. Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning. We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement [41.7426496795769]
We propose Meta Decision Transformer (Meta-DT) to achieve efficient generalization in offline meta-RL. We pretrain a context-aware world model to learn a compact task representation, and inject it as a contextual condition to guide task-oriented sequence generation. We show that Meta-DT exhibits superior few and zero-shot generalization capacity compared to strong baselines.
arXiv Detail & Related papers (2024-10-15T09:51:30Z)
A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language. To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates. We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
On Conditional and Compositional Language Model Differentiable Prompting [75.76546041094436]
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts.
arXiv Detail & Related papers (2023-07-04T02:47:42Z)
Foundation Models for Natural Language Processing -- Pre-trained Language Models Integrating Media [0.0]
Foundation Models are pre-trained language models for Natural Language Processing. They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning. This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
arXiv Detail & Related papers (2023-02-16T20:42:04Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures [0.0]
Natural Language Processing models have achieved phenomenal success in linguistic and semantic tasks. Recent NLP architectures have utilized concepts of transfer learning, pruning, quantization, and knowledge distillation to achieve moderate model sizes. Knowledge Retrievers have been built to extricate explicit data documents from a large corpus of databases with greater efficiency and accuracy.
arXiv Detail & Related papers (2021-03-23T22:38:20Z)
A Recurrent Vision-and-Language BERT for Navigation [54.059606864535304]
We propose a recurrent BERT model that is time-aware for use in vision-and-language navigation. Our model can replace more complex encoder-decoder models to achieve state-of-the-art results.
arXiv Detail & Related papers (2020-11-26T00:23:00Z)
On the comparability of Pre-trained Language Models [0.0]
Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP. More elaborated architectures are making better use of contextual information. Larger corpora are used as resources for pre-training large language models in a self-supervised fashion. Advances in parallel computing as well as in cloud computing made it possible to train these models with growing capacities in the same or even in shorter time than previously established models.
arXiv Detail & Related papers (2020-01-03T10:53:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.