Neural architecture impact on identifying temporally extended
Reinforcement Learning tasks
- URL: http://arxiv.org/abs/2310.03161v1
- Date: Wed, 4 Oct 2023 21:09:19 GMT
- Title: Neural architecture impact on identifying temporally extended
Reinforcement Learning tasks
- Authors: Victor Vadakechirayath George
- Abstract summary: We present Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari- 2600 game suite.
In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions.
In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by recent developments in attention models for image classification
and natural language processing, we present various Attention based
architectures in reinforcement learning (RL) domain, capable of performing well
on OpenAI Gym Atari-2600 game suite. In spite of the recent success of Deep
Reinforcement learning techniques in various fields like robotics, gaming and
healthcare, they suffer from a major drawback that neural networks are
difficult to interpret. We try to get around this problem with the help of
Attention based models. In Attention based models, extracting and overlaying of
attention map onto images allows for direct observation of information used by
agent to select actions and easier interpretation of logic behind the chosen
actions. Our models in addition to playing well on gym-Atari environments, also
provide insights on how agent perceives its environment. In addition, motivated
by recent developments in attention based video-classification models using
Vision Transformer, we come up with an architecture based on Vision
Transformer, for image-based RL domain too. Compared to previous works in
Vision Transformer, our model is faster to train and requires fewer
computational resources. 3
Related papers
- Bridging Operator Learning and Conditioned Neural Fields: A Unifying Perspective [24.1795082775376]
Operator learning is an emerging area of machine learning which aims to learn mappings between infinite dimensional function spaces.
We find that many commonly used operator learning models can be viewed as neural fields with conditioning mechanisms restricted to point-wise and/or global information.
Motivated by this, we propose the Continuous Vision Transformer (CViT), a novel neural operator architecture that employs a vision transformer encoder.
arXiv Detail & Related papers (2024-05-22T21:13:23Z) - Vision Transformers Need Registers [26.63912173005165]
We identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks.
We show that this solution fixes that problem entirely for both supervised and self-supervised models.
arXiv Detail & Related papers (2023-09-28T16:45:46Z) - GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception
Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations.
We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z) - Visual Prompt Tuning for Generative Transfer Learning [26.895321693202284]
We present a recipe for learning vision transformers by generative knowledge transfer.
We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.
To adapt to a new domain, we employ prompt tuning, which prepends learnable tokens called prompt to the image token sequence.
arXiv Detail & Related papers (2022-10-03T14:56:05Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - Improving Sample Efficiency of Value Based Models Using Attention and
Vision Transformers [52.30336730712544]
We introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance.
We propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation.
We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
arXiv Detail & Related papers (2022-02-01T19:03:03Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - D2RL: Deep Dense Architectures in Reinforcement Learning [47.67475810050311]
We take inspiration from successful architectural choices in computer vision and generative modelling.
We investigate the use of deeper networks and dense connections for reinforcement learning on a variety of simulated robotic learning benchmark environments.
arXiv Detail & Related papers (2020-10-19T01:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.