Improving Sample Efficiency of Value Based Models Using Attention and
Vision Transformers
- URL: http://arxiv.org/abs/2202.00710v1
- Date: Tue, 1 Feb 2022 19:03:03 GMT
- Title: Improving Sample Efficiency of Value Based Models Using Attention and
Vision Transformers
- Authors: Amir Ardalan Kalantari, Mohammad Amini, Sarath Chandar, Doina Precup
- Abstract summary: We introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance.
We propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation.
We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
- Score: 52.30336730712544
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Much of recent Deep Reinforcement Learning success is owed to the neural
architecture's potential to learn and use effective internal representations of
the world. While many current algorithms access a simulator to train with a
large amount of data, in realistic settings, including while playing games that
may be played against people, collecting experience can be quite costly. In
this paper, we introduce a deep reinforcement learning architecture whose
purpose is to increase sample efficiency without sacrificing performance. We
design this architecture by incorporating advances achieved in recent years in
the field of Natural Language Processing and Computer Vision. Specifically, we
propose a visually attentive model that uses transformers to learn a
self-attention mechanism on the feature maps of the state representation, while
simultaneously optimizing return. We demonstrate empirically that this
architecture improves sample complexity for several Atari environments, while
also achieving better performance in some of the games.
Related papers
- Transformers and Slot Encoding for Sample Efficient Physical World Modelling [1.5498250598583487]
We propose an architecture combining Transformers for world modelling with the slot-attention paradigm, an approach for learning representations of objects appearing in a scene.
We describe the resulting neural architecture and report experimental results showing an improvement over the existing solutions in terms of sample efficiency and a reduction of the variation of the performance over the training examples.
arXiv Detail & Related papers (2024-05-30T15:48:04Z) - Reusable Architecture Growth for Continual Stereo Matching [92.36221737921274]
We introduce a Reusable Architecture Growth (RAG) framework to learn new scenes continually in both supervised and self-supervised manners.
RAG can maintain high reusability during growth by reusing previous units while obtaining good performance.
We also present a Scene Router module to adaptively select the scene-specific architecture path at inference.
arXiv Detail & Related papers (2024-03-30T13:24:58Z) - Neural architecture impact on identifying temporally extended
Reinforcement Learning tasks [0.0]
We present Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari- 2600 game suite.
In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions.
In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too.
arXiv Detail & Related papers (2023-10-04T21:09:19Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - RLFlow: Optimising Neural Network Subgraph Transformation with World
Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime.
We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z) - Generative Adversarial Transformers [13.633811200719627]
We introduce the GANsformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling.
The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency.
We show it achieves state-of-the-art results in terms of image quality and diversity, while enjoying fast learning and better data-efficiency.
arXiv Detail & Related papers (2021-03-01T18:54:04Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.