Pixel to policy: DQN Encoders for within & cross-game reinforcement
learning
- URL: http://arxiv.org/abs/2308.00318v1
- Date: Tue, 1 Aug 2023 06:29:33 GMT
- Title: Pixel to policy: DQN Encoders for within & cross-game reinforcement
learning
- Authors: Ashrya Agrawal, Priyanshi Shah, Sourabh Prakash
- Abstract summary: Reinforcement Learning can be applied to various tasks, and environments.
Many environments have a similar structure, which can be exploited to improve RL performance on other tasks.
This work explores as well as compares the performance between RL models being trained from the scratch and on different approaches of transfer learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning can be applied to various tasks, and environments.
Many of these environments have a similar shared structure, which can be
exploited to improve RL performance on other tasks. Transfer learning can be
used to take advantage of this shared structure, by learning policies that are
transferable across different tasks and environments and can lead to more
efficient learning as well as improved performance on a wide range of tasks.
This work explores as well as compares the performance between RL models being
trained from the scratch and on different approaches of transfer learning.
Additionally, the study explores the performance of a model trained on multiple
game environments, with the goal of developing a universal game-playing agent
as well as transfer learning a pre-trained encoder using DQN, and training it
on the same game or a different game. Our DQN model achieves a mean episode
reward of 46.16 which even beats the human-level performance with merely 20k
episodes which is significantly lower than deepmind's 1M episodes. The achieved
mean rewards of 533.42 and 402.17 on the Assault and Space Invader environments
respectively, represent noteworthy performance on these challenging
environments.
Related papers
- Probing Transfer in Deep Reinforcement Learning without Task Engineering [26.637254541454773]
We evaluate the use of original game curricula supported by the Atari 2600 console as a heterogeneous transfer benchmark for deep reinforcement learning agents.
Game designers created curricula using combinations of several discrete modifications to the basic versions of games such as Space Invaders, Breakout and Freeway.
We show that zero-shot transfer from the basic games to their variations is possible, but the variance in performance is also largely explained by interactions between factors.
arXiv Detail & Related papers (2022-10-22T13:40:12Z) - Pathfinding in Random Partially Observable Environments with
Vision-Informed Deep Reinforcement Learning [1.332560004325655]
Deep reinforcement learning is a technique for solving problems in a variety of environments, ranging from Atari video games to stock trading.
This method leverages deep neural network models to make decisions based on observations of a given environment with the goal of maximizing a reward function that can incorporate cost and rewards for reaching goals.
In this work, multiple Deep Q-Network (DQN) agents are trained to operate in a partially observable environment with the goal of reaching a target zone in minimal travel time.
arXiv Detail & Related papers (2022-09-11T06:32:00Z) - Improving Experience Replay through Modeling of Similar Transitions'
Sets [0.0]
We propose and evaluate a new reinforcement learning method, COMPact Experience Replay (COMPER)
Our objective is to reduce the required number of experiences to agent training regarding the total accumulated rewarding in the long run.
We report detailed results from five training trials of COMPER for just 100,000 frames and about 25,000 iterations with a small experiences memory.
arXiv Detail & Related papers (2021-11-12T19:27:15Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Learning to Run with Potential-Based Reward Shaping and Demonstrations
from Video Data [70.540936204654]
"Learning to run" competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed.
All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour.
We demonstrate how data from videos of human running can be used to shape the reward of the humanoid learning agent.
arXiv Detail & Related papers (2020-12-16T09:46:58Z) - Deep Policy Networks for NPC Behaviors that Adapt to Changing Design
Parameters in Roguelike Games [137.86426963572214]
Turn-based strategy games like Roguelikes, for example, present unique challenges to Deep Reinforcement Learning (DRL)
We propose two network architectures to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions.
arXiv Detail & Related papers (2020-12-07T08:47:25Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Dynamic Experience Replay [6.062589413216726]
We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks.
In particular, we run experiments on two different tasks: peg-in-hole and lap-joint.
Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments.
arXiv Detail & Related papers (2020-03-04T23:46:45Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z) - Model-Based Reinforcement Learning for Atari [89.3039240303797]
We show how video prediction models can enable agents to solve Atari games with fewer interactions than model-free methods.
Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environment.
arXiv Detail & Related papers (2019-03-01T15:40:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.