Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and
Stability
- URL: http://arxiv.org/abs/2204.04292v3
- Date: Mon, 24 Apr 2023 20:18:58 GMT
- Title: Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and
Stability
- Authors: Juan Jose Garau-Luis, Yingjie Miao, John D. Co-Reyes, Aaron Parisi,
Jie Tan, Esteban Real, Aleksandra Faust
- Abstract summary: Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world.
This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions.
- Score: 67.8426046908398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalizability and stability are two key objectives for operating
reinforcement learning (RL) agents in the real world. Designing RL algorithms
that optimize these objectives can be a costly and painstaking process. This
paper presents MetaPG, an evolutionary method for automated design of
actor-critic loss functions. MetaPG explicitly optimizes for generalizability
and performance, and implicitly optimizes the stability of both metrics. We
initialize our loss function population with Soft Actor-Critic (SAC) and
perform multi-objective optimization using fitness metrics encoding single-task
performance, zero-shot generalizability to unseen environment configurations,
and stability across independent runs with different random seeds. On a set of
continuous control tasks from the Real-World RL Benchmark Suite, we find that
our method, using a single environment during evolution, evolves algorithms
that improve upon SAC's performance and generalizability by 4% and 20%,
respectively, and reduce instability up to 67%. Then, we scale up to more
complex environments from the Brax physics simulator and replicate
generalizability tests encountered in practical settings, such as different
friction coefficients. MetaPG evolves algorithms that can obtain 10% better
generalizability without loss of performance within the same meta-training
environment and obtain similar results to SAC when doing cross-domain
evaluations in other Brax environments. The evolution results are
interpretable; by analyzing the structure of the best algorithms we identify
elements that help optimizing certain objectives, such as regularization terms
for the critic loss.
Related papers
- Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations [20.809499420384256]
STORM-based algorithms have been widely developed to solve one to $K$-level ($K geq 3$) optimization problems.
This paper provides a comprehensive analysis of three representative STORM-based algorithms.
arXiv Detail & Related papers (2024-07-07T07:07:04Z) - Beyond Single-Model Views for Deep Learning: Optimization versus
Generalizability of Stochastic Optimization Algorithms [13.134564730161983]
This paper adopts a novel approach to deep learning optimization, focusing on gradient descent (SGD) and its variants.
We show that SGD and its variants demonstrate performance on par with flat-minimas like SAM, albeit with half the gradient evaluations.
Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD and noise-enabled variants.
arXiv Detail & Related papers (2024-03-01T14:55:22Z) - An Invariant Information Geometric Method for High-Dimensional Online
Optimization [9.538618632613714]
We introduce a full invariance oriented evolution strategies algorithm, derived from its corresponding framework.
We benchmark SynCMA against leading algorithms in Bayesian optimization and evolution strategies.
In all scenarios, SynCMA demonstrates great competence, if not dominance, over other algorithms in sample efficiency.
arXiv Detail & Related papers (2024-01-03T07:06:26Z) - Advancements in Optimization: Adaptive Differential Evolution with
Diversification Strategy [0.0]
The study employs single-objective optimization in a two-dimensional space and runs ADEDS on each of the benchmark functions with multiple iterations.
ADEDS consistently outperforms standard DE for a variety of optimization challenges, including functions with numerous local optima, plate-shaped, valley-shaped, stretched-shaped, and noisy functions.
arXiv Detail & Related papers (2023-10-02T10:05:41Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Generalized Reinforcement Meta Learning for Few-Shot Optimization [3.7675996866306845]
We present a generic and flexible Reinforcement Learning (RL) based meta-learning framework for the problem of few-shot learning.
Our framework could be easily extended to do network architecture search.
arXiv Detail & Related papers (2020-05-04T03:21:05Z) - Stochastic batch size for adaptive regularization in deep network
optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework.
We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.