Double A3C: Deep Reinforcement Learning on OpenAI Gym Games
- URL: http://arxiv.org/abs/2303.02271v1
- Date: Sat, 4 Mar 2023 00:06:27 GMT
- Title: Double A3C: Deep Reinforcement Learning on OpenAI Gym Games
- Authors: Yangxin Zhong, Jiajie He, and Lingjie Kong
- Abstract summary: Reinforcement Learning (RL) is an area of machine learning figuring out how agents take actions in an unknown environment to maximize its rewards.
We will propose and implement an improved version of Double A3C algorithm which utilizing the strength of both algorithms to play OpenAI Gym Atari 2600 games to beat its benchmarks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) is an area of machine learning figuring out how
agents take actions in an unknown environment to maximize its rewards. Unlike
classical Markov Decision Process (MDP) in which agent has full knowledge of
its state, rewards, and transitional probability, reinforcement learning
utilizes exploration and exploitation for the model uncertainty. Under the
condition that the model usually has a large state space, a neural network (NN)
can be used to correlate its input state to its output actions to maximize the
agent's rewards. However, building and training an efficient neural network is
challenging. Inspired by Double Q-learning and Asynchronous Advantage
Actor-Critic (A3C) algorithm, we will propose and implement an improved version
of Double A3C algorithm which utilizing the strength of both algorithms to play
OpenAI Gym Atari 2600 games to beat its benchmarks for our project.
Related papers
- Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services [55.0337199834612]
Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services.
These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge.
We introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.
arXiv Detail & Related papers (2024-11-03T07:01:13Z) - Mastering Chinese Chess AI (Xiangqi) Without Search [2.309569018066392]
We have developed a high-performance Chinese Chess AI that operates without reliance on search algorithms.
This AI has demonstrated the capability to compete at a level commensurate with the top 0.1% of human players.
arXiv Detail & Related papers (2024-10-07T09:27:51Z) - Sup3r: A Semi-Supervised Algorithm for increasing Sparsity, Stability, and Separability in Hierarchy Of Time-Surfaces architectures [3.533874233403883]
Sup3r enhances sparsity, stability, and separability in the HOTS networks.
Sup3r learns class-informative patterns, mitigates confounding features, and reduces the number of processed events.
Preliminary results on N-MNIST demonstrate that Sup3r achieves comparable accuracy to similarly sized Artificial Neural Networks trained with back-propagation.
arXiv Detail & Related papers (2024-04-15T09:33:19Z) - Human-Timescale Adaptation in an Open-Ended Task Space [56.55530165036327]
We show that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans.
Our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
arXiv Detail & Related papers (2023-01-18T15:39:21Z) - Learning Algorithms for Intelligent Agents and Mechanisms [4.251500966181852]
In this thesis, we research learning algorithms for optimal decision making in two different contexts, Reinforcement Learning in Part I and Auction Design in Part II.
In Chapter 2, inspired by statistical physics, we develop a novel approach to Reinforcement Learning (RL) that not only learns optimal policies with enhanced desirable properties but also sheds new light on maximum entropy RL.
In Chapter 3, we tackle the generalization problem in RL using a Bayesian perspective. We show that imperfect knowledge of the environments dynamics effectively turn a fully-observed Markov Decision Process (MDP) into a Partially Observed MDP (POMD
arXiv Detail & Related papers (2022-10-06T03:12:43Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Chrome Dino Run using Reinforcement Learning [0.0]
We study most popular model free reinforcement learning algorithms along with convolutional neural network to train the agent for playing the game of Chrome Dino Run.
We have used two of the popular temporal difference approaches namely Deep Q-Learning, and Expected SARSA and also implemented Double DQN model to train the agent.
arXiv Detail & Related papers (2020-08-15T22:18:20Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.