Learning Associative Inference Using Fast Weight Memory
- URL: http://arxiv.org/abs/2011.07831v2
- Date: Tue, 23 Feb 2021 17:00:19 GMT
- Title: Learning Associative Inference Using Fast Weight Memory
- Authors: Imanol Schlag, Tsendsuren Munkhdalai, J\"urgen Schmidhuber
- Abstract summary: We augment the LSTM model with an associative memory, dubbed Fast Weight Memory (FWM)
Our model is trained end-to-end by gradient descent and yields excellent performance on compositional language reasoning problems.
- Score: 12.239487954915646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans can quickly associate stimuli to solve problems in novel contexts. Our
novel neural network model learns state representations of facts that can be
composed to perform such associative inference. To this end, we augment the
LSTM model with an associative memory, dubbed Fast Weight Memory (FWM). Through
differentiable operations at every step of a given input sequence, the LSTM
updates and maintains compositional associations stored in the rapidly changing
FWM weights. Our model is trained end-to-end by gradient descent and yields
excellent performance on compositional language reasoning problems,
meta-reinforcement-learning for POMDPs, and small-scale word-level language
modelling.
Related papers
- CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network [16.317199232071232]
Large Language Models (LLMs) have been shown to be effective models of the human language system.
In this work, we investigate the key architectural components driving the surprising alignment of untrained models.
arXiv Detail & Related papers (2024-06-21T12:54:03Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - Scaling Properties of Speech Language Models [4.0142527158949415]
Speech Language Models (SLMs) aim to learn language from raw audio, without textual resources.
We estimate the scale at which our current methods will yield a SLM with the English proficiency of text-based Large Language Models (LLMs)
arXiv Detail & Related papers (2024-03-31T13:30:12Z) - RelationVLM: Making Large Vision-Language Models Understand Visual Relations [66.70252936043688]
We present RelationVLM, a large vision-language model capable of comprehending various levels and types of relations whether across multiple images or within a video.
Specifically, we devise a multi-stage relation-aware training scheme and a series of corresponding data configuration strategies to bestow RelationVLM with the capabilities of understanding semantic relations.
arXiv Detail & Related papers (2024-03-19T15:01:19Z) - CAMELoT: Towards Large Language Models with Training-Free Consolidated
Associative Memory [38.429707659685974]
Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs.
We introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training.
This architecture, which we call CAMELoT, demonstrates superior performance even with a tiny context window of 128 tokens.
arXiv Detail & Related papers (2024-02-21T01:00:17Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Continual Variational Autoencoder Learning via Online Cooperative
Memorization [11.540150938141034]
Variational Autoencoders (VAE) have been successfully used in continual learning classification tasks.
However, their ability to generate images with specifications corresponding to the classes and databases learned during Continual Learning is not well understood.
We develop a new theoretical framework that formulates CL as a dynamic optimal transport problem.
We then propose a novel memory buffering approach, namely the Online Cooperative Memorization (OCM) framework.
arXiv Detail & Related papers (2022-07-20T18:19:27Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.