Towards Automatic Actor-Critic Solutions to Continuous Control
- URL: http://arxiv.org/abs/2106.08918v1
- Date: Wed, 16 Jun 2021 16:18:20 GMT
- Title: Towards Automatic Actor-Critic Solutions to Continuous Control
- Authors: Jake Grigsby, Jin Yong Yoo, Yanjun Qi
- Abstract summary: This paper creates an evolutionary approach that tunes actor-critic algorithms to new domains.
Our design is sample efficient and provides practical advantages over baseline approaches.
We then apply it to new control tasks to find high-performance solutions with minimal compute and research effort.
- Score: 7.312692481631664
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-free off-policy actor-critic methods are an efficient solution to
complex continuous control tasks. However, these algorithms rely on a number of
design tricks and many hyperparameters, making their applications to new
domains difficult and computationally expensive. This paper creates an
evolutionary approach that automatically tunes these design decisions and
eliminates the RL-specific hyperparameters from the Soft Actor-Critic
algorithm. Our design is sample efficient and provides practical advantages
over baseline approaches, including improved exploration, generalization over
multiple control frequencies, and a robust ensemble of high-performance
policies. Empirically, we show that our agent outperforms well-tuned
hyperparameter settings in popular benchmarks from the DeepMind Control Suite.
We then apply it to new control tasks to find high-performance solutions with
minimal compute and research effort.
Related papers
- MOSEAC: Streamlined Variable Time Step Reinforcement Learning [14.838483990647697]
We introduce the Multi-Objective Soft Elastic Actor-Critic (MOSEAC) method.
MOSEAC features an adaptive reward scheme based on observed trends in task rewards during training.
We validate the MOSEAC method through simulations in a Newtonian kinematics environment.
arXiv Detail & Related papers (2024-06-03T16:51:57Z) - Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning [18.579378919155864]
We propose Adaptive $Q$Network (AdaQN) to take into account the non-stationarity of the optimization procedure without requiring additional samples.
AdaQN is theoretically sound and empirically validate it in MuJoCo control problems and Atari $2600 games.
arXiv Detail & Related papers (2024-05-25T11:57:43Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Tune As You Scale: Hyperparameter Optimization For Compute Efficient
Training [0.0]
We propose a practical method for robustly tuning large models.
CarBS performs local search around the performance-cost frontier.
Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline.
arXiv Detail & Related papers (2023-06-13T18:22:24Z) - Multi-Agent Reinforcement Learning for Microprocessor Design Space
Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency.
We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem.
Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Hyperparameter Tuning for Deep Reinforcement Learning Applications [0.3553493344868413]
We propose a distributed variable-length genetic algorithm framework to tune hyperparameters for various RL applications.
Our results show that with more generations, optimal solutions that require fewer training episodes and are computationally cheap while being more robust for deployment.
arXiv Detail & Related papers (2022-01-26T20:43:13Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Multi-Level Evolution Strategies for High-Resolution Black-Box Control [0.2320417845168326]
This paper introduces a multi-level (m-lev) mechanism into Evolution Strategies (ESs)
It addresses a class of global optimization problems that could benefit from fine discretization of their decision variables.
arXiv Detail & Related papers (2020-10-04T09:24:40Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.