Neural architecture impact on identifying temporally extended
Reinforcement Learning tasks
- URL: http://arxiv.org/abs/2310.03161v1
- Date: Wed, 4 Oct 2023 21:09:19 GMT
- Title: Neural architecture impact on identifying temporally extended
Reinforcement Learning tasks
- Authors: Victor Vadakechirayath George
- Abstract summary: We present Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari- 2600 game suite.
In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions.
In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by recent developments in attention models for image classification
and natural language processing, we present various Attention based
architectures in reinforcement learning (RL) domain, capable of performing well
on OpenAI Gym Atari-2600 game suite. In spite of the recent success of Deep
Reinforcement learning techniques in various fields like robotics, gaming and
healthcare, they suffer from a major drawback that neural networks are
difficult to interpret. We try to get around this problem with the help of
Attention based models. In Attention based models, extracting and overlaying of
attention map onto images allows for direct observation of information used by
agent to select actions and easier interpretation of logic behind the chosen
actions. Our models in addition to playing well on gym-Atari environments, also
provide insights on how agent perceives its environment. In addition, motivated
by recent developments in attention based video-classification models using
Vision Transformer, we come up with an architecture based on Vision
Transformer, for image-based RL domain too. Compared to previous works in
Vision Transformer, our model is faster to train and requires fewer
computational resources. 3
Related papers
- ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models [55.07988373824348]
We study the visual generalization capabilities of three existing robotic foundation models.
Our study shows that the existing models do not exhibit robustness to visual out-of-domain scenarios.
We propose a gradual backbone reversal approach founded on model merging.
arXiv Detail & Related papers (2024-09-23T17:47:59Z) - A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships [0.5639904484784127]
Transformer-based models have transformed the landscape of natural language processing (NLP)
These models are renowned for their ability to capture long-range dependencies and contextual information.
We discuss potential research directions and applications of transformer-based models in computer vision.
arXiv Detail & Related papers (2024-08-27T16:22:18Z) - Theia: Distilling Diverse Vision Foundation Models for Robot Learning [6.709078873834651]
Theia is a vision foundation model for robot learning that distills multiple off-the-shelf vision foundation models trained on varied vision tasks.
Theia's rich visual representations encode diverse visual knowledge, enhancing downstream robot learning.
arXiv Detail & Related papers (2024-07-29T17:08:21Z) - Vision Transformers Need Registers [26.63912173005165]
We identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks.
We show that this solution fixes that problem entirely for both supervised and self-supervised models.
arXiv Detail & Related papers (2023-09-28T16:45:46Z) - GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception
Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations.
We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - Improving Sample Efficiency of Value Based Models Using Attention and
Vision Transformers [52.30336730712544]
We introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance.
We propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation.
We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
arXiv Detail & Related papers (2022-02-01T19:03:03Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - D2RL: Deep Dense Architectures in Reinforcement Learning [47.67475810050311]
We take inspiration from successful architectural choices in computer vision and generative modelling.
We investigate the use of deeper networks and dense connections for reinforcement learning on a variety of simulated robotic learning benchmark environments.
arXiv Detail & Related papers (2020-10-19T01:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.