History Compression via Language Models in Reinforcement Learning
- URL: http://arxiv.org/abs/2205.12258v1
- Date: Tue, 24 May 2022 17:59:29 GMT
- Title: History Compression via Language Models in Reinforcement Learning
- Authors: Fabian Paischer, Thomas Adler, Vihang Patil, Angela Bitto-Nemling,
Markus Holzleitner, Sebastian Lehner, Hamid Eghbal-zadeh, Sepp Hochreiter
- Abstract summary: In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP.
We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency.
Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module.
- Score: 5.937618881286057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a partially observable Markov decision process (POMDP), an agent typically
uses a representation of the past to approximate the underlying MDP. We propose
to utilize a frozen Pretrained Language Transformer (PLT) for history
representation and compression to improve sample efficiency. To avoid training
of the Transformer, we introduce FrozenHopfield, which automatically associates
observations with original token embeddings. To form these associations, a
modern Hopfield network stores the original token embeddings, which are
retrieved by queries that are obtained by a random but fixed projection of
observations. Our new method, HELM, enables actor-critic network architectures
that contain a pretrained language Transformer for history representation as a
memory module. Since a representation of the past need not be learned, HELM is
much more sample efficient than competitors. On Minigrid and Procgen
environments HELM achieves new state-of-the-art results. Our code is available
at https://github.com/ml-jku/helm.
Related papers
- With a Little Help from your own Past: Prototypical Memory Networks for
Image Captioning [47.96387857237473]
We devise a network which can perform attention over activations obtained while processing other training samples.
Our memory models the distribution of past keys and values through the definition of prototype vectors.
We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training.
arXiv Detail & Related papers (2023-08-23T18:53:00Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - Improving language models by retrieving from trillions of tokens [50.42630445476544]
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus.
With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile.
arXiv Detail & Related papers (2021-12-08T17:32:34Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.