Maximum Mutation Reinforcement Learning for Scalable Control
- URL: http://arxiv.org/abs/2007.13690v7
- Date: Sat, 16 Jan 2021 23:51:53 GMT
- Title: Maximum Mutation Reinforcement Learning for Scalable Control
- Authors: Karush Suri, Xiao Qi Shi, Konstantinos N. Plataniotis, Yuri A.
Lawryshyn
- Abstract summary: Reinforcement Learning (RL) has demonstrated data efficiency and optimal control over large state spaces at the cost of scalable performance.
We present the Evolution-based Soft Actor-Critic (ESAC), a scalable RL algorithm.
- Score: 25.935468948833073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advances in Reinforcement Learning (RL) have demonstrated data efficiency and
optimal control over large state spaces at the cost of scalable performance.
Genetic methods, on the other hand, provide scalability but depict
hyperparameter sensitivity towards evolutionary operations. However, a
combination of the two methods has recently demonstrated success in scaling RL
agents to high-dimensional action spaces. Parallel to recent developments, we
present the Evolution-based Soft Actor-Critic (ESAC), a scalable RL algorithm.
We abstract exploration from exploitation by combining Evolution Strategies
(ES) with Soft Actor-Critic (SAC). Through this lens, we enable dominant skill
transfer between offsprings by making use of soft winner selections and genetic
crossovers in hindsight and simultaneously improve hyperparameter sensitivity
in evolutions using the novel Automatic Mutation Tuning (AMT). AMT gradually
replaces the entropy framework of SAC allowing the population to succeed at the
task while acting as randomly as possible, without making use of
backpropagation updates. In a study of challenging locomotion tasks consisting
of high-dimensional action spaces and sparse rewards, ESAC demonstrates
improved performance and sample efficiency in comparison to the Maximum Entropy
framework. Additionally, ESAC presents efficacious use of hardware resources
and algorithm overhead. A complete implementation of ESAC can be found at
karush17.github.io/esac-web/.
Related papers
- MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training.
We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z) - Active learning for energy-based antibody optimization and enhanced screening [0.0]
We propose an active learning workflow that efficiently trains a deep learning model to learn energy functions for specific targets.
In a case study targeting HER2-binding Trastuzumab mutants, our approach significantly improved the screening performance over random selection.
arXiv Detail & Related papers (2024-09-17T08:01:58Z) - Trackable Agent-based Evolution Models at Wafer Scale [0.0]
We focus on the problem of extracting phylogenetic information from agent-based evolution on the 850,000 processor Cerebras Wafer Scale Engine (WSE)
We present an asynchronous island-based genetic algorithm (GA) framework for WSE hardware.
We validate phylogenetic reconstructions from these trials and demonstrate their suitability for inference of underlying evolutionary conditions.
arXiv Detail & Related papers (2024-04-16T19:24:14Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality [65.67315418971688]
Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR) are proposed.
Experiments on visual recognition demonstrate that our methods can simultaneously improve the covariance conditioning and generalization.
arXiv Detail & Related papers (2022-07-05T15:39:29Z) - Transformers are Meta-Reinforcement Learners [0.060917028769172814]
We present TrMRL, a meta-RL agent that mimics the memory reinstatement mechanism using the transformer architecture.
We show that the self-attention computes a consensus representation that minimizes the Bayes Risk at each layer.
Results show that TrMRL presents comparable or superior performance, sample efficiency, and out-of-distribution generalization.
arXiv Detail & Related papers (2022-06-14T06:21:13Z) - Hyper-Learning for Gradient-Based Batch Size Adaptation [2.944323057176686]
Scheduling the batch size to increase is an effective strategy to control noise when training deep neural networks.
We introduce Arbiter as a new hyper-optimization algorithm to perform batch size adaptations for learnable schedulings.
We demonstrate Arbiter's effectiveness in several illustrative experiments.
arXiv Detail & Related papers (2022-05-17T11:01:14Z) - Direct Mutation and Crossover in Genetic Algorithms Applied to
Reinforcement Learning Tasks [0.9137554315375919]
This paper will focus on applying neuroevolution using a simple genetic algorithm (GA) to find the weights of a neural network that produce optimally behaving agents.
We present two novel modifications that improve the data efficiency and speed of convergence when compared to the initial implementation.
arXiv Detail & Related papers (2022-01-13T07:19:28Z) - Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in
Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes.
We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks.
Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z) - Adam revisited: a weighted past gradients perspective [57.54752290924522]
We propose a novel adaptive method weighted adaptive algorithm (WADA) to tackle the non-convergence issues.
We prove that WADA can achieve a weighted data-dependent regret bound, which could be better than the original regret bound of ADAGRAD.
arXiv Detail & Related papers (2021-01-01T14:01:52Z) - EOS: a Parallel, Self-Adaptive, Multi-Population Evolutionary Algorithm
for Constrained Global Optimization [68.8204255655161]
EOS is a global optimization algorithm for constrained and unconstrained problems of real-valued variables.
It implements a number of improvements to the well-known Differential Evolution (DE) algorithm.
Results prove that EOSis capable of achieving increased performance compared to state-of-the-art single-population self-adaptive DE algorithms.
arXiv Detail & Related papers (2020-07-09T10:19:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.