RAMario: Experimental Approach to Reptile Algorithm -- Reinforcement
Learning for Mario
- URL: http://arxiv.org/abs/2305.09655v1
- Date: Tue, 16 May 2023 17:54:14 GMT
- Title: RAMario: Experimental Approach to Reptile Algorithm -- Reinforcement
Learning for Mario
- Authors: Sanyam Jain
- Abstract summary: We implement the Reptile algorithm using the Super Mario Bros library and weights in Python, creating a neural network model.
We train the model using multiple tasks and episodes, choosing actions using the current neural network model, taking those actions in the environment, and updating the model using the Reptile algorithm.
Our results demonstrate that the Reptile algorithm provides a promising approach to few-shot learning in video game AI, with comparable or even better performance than the other two algorithms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research paper presents an experimental approach to using the Reptile
algorithm for reinforcement learning to train a neural network to play Super
Mario Bros. We implement the Reptile algorithm using the Super Mario Bros Gym
library and TensorFlow in Python, creating a neural network model with a single
convolutional layer, a flatten layer, and a dense layer. We define the
optimizer and use the Reptile class to create an instance of the Reptile
meta-learning algorithm. We train the model using multiple tasks and episodes,
choosing actions using the current weights of the neural network model, taking
those actions in the environment, and updating the model weights using the
Reptile algorithm. We evaluate the performance of the algorithm by printing the
total reward for each episode. In addition, we compare the performance of the
Reptile algorithm approach to two other popular reinforcement learning
algorithms, Proximal Policy Optimization (PPO) and Deep Q-Network (DQN),
applied to the same Super Mario Bros task. Our results demonstrate that the
Reptile algorithm provides a promising approach to few-shot learning in video
game AI, with comparable or even better performance than the other two
algorithms, particularly in terms of moves vs distance that agent performs for
1M episodes of training. The results shows that best total distance for world
1-2 in the game environment were ~1732 (PPO), ~1840 (DQN) and ~2300 (RAMario).
Full code is available at https://github.com/s4nyam/RAMario.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Performance and Energy Consumption of Parallel Machine Learning
Algorithms [0.0]
Machine learning models have achieved remarkable success in various real-world applications.
Model training in machine learning requires large-scale data sets and multiple iterations before it can work properly.
Parallelization of training algorithms is a common strategy to speed up the process of training.
arXiv Detail & Related papers (2023-05-01T13:04:39Z) - Tricks and Plugins to GBM on Images and Sequences [18.939336393665553]
We propose a new algorithm for boosting Deep Convolutional Neural Networks (BoostCNN) to combine the merits of dynamic feature selection and BoostCNN.
We also propose a set of algorithms to incorporate boosting weights into a deep learning architecture based on a least squares objective function.
Experiments show that the proposed methods outperform benchmarks on several fine-grained classification tasks.
arXiv Detail & Related papers (2022-03-01T21:59:00Z) - Shedding some light on Light Up with Artificial Intelligence [0.3867363075280543]
The Light-Up puzzle, also known as the AKARI puzzle, has never been solved using modern artificial intelligence (AI) methods.
This project is an effort to apply new AI techniques for solving the Light-up puzzle faster and more computationally efficient.
arXiv Detail & Related papers (2021-07-22T03:03:57Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Learning to Run with Potential-Based Reward Shaping and Demonstrations
from Video Data [70.540936204654]
"Learning to run" competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed.
All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour.
We demonstrate how data from videos of human running can be used to shape the reward of the humanoid learning agent.
arXiv Detail & Related papers (2020-12-16T09:46:58Z) - Chrome Dino Run using Reinforcement Learning [0.0]
We study most popular model free reinforcement learning algorithms along with convolutional neural network to train the agent for playing the game of Chrome Dino Run.
We have used two of the popular temporal difference approaches namely Deep Q-Learning, and Expected SARSA and also implemented Double DQN model to train the agent.
arXiv Detail & Related papers (2020-08-15T22:18:20Z) - TAdam: A Robust Stochastic Gradient Optimizer [6.973803123972298]
Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain.
To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed.
We propose a new gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea.
arXiv Detail & Related papers (2020-02-29T04:32:36Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z) - Model-Based Reinforcement Learning for Atari [89.3039240303797]
We show how video prediction models can enable agents to solve Atari games with fewer interactions than model-free methods.
Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environment.
arXiv Detail & Related papers (2019-03-01T15:40:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.