Solving Continuous Control via Q-learning
- URL: http://arxiv.org/abs/2210.12566v2
- Date: Mon, 25 Sep 2023 22:49:51 GMT
- Title: Solving Continuous Control via Q-learning
- Authors: Tim Seyde, Peter Werner, Wilko Schwarting, Igor Gilitschenski, Martin
Riedmiller, Daniela Rus, Markus Wulfmeier
- Abstract summary: We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
- Score: 54.05120662838286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While there has been substantial success for solving continuous control with
actor-critic methods, simpler critic-only methods such as Q-learning find
limited application in the associated high-dimensional action spaces. However,
most actor-critic methods come at the cost of added complexity: heuristics for
stabilisation, compute requirements and wider hyperparameter search spaces. We
show that a simple modification of deep Q-learning largely alleviates these
issues. By combining bang-bang action discretization with value decomposition,
framing single-agent control as cooperative multi-agent reinforcement learning
(MARL), this simple critic-only approach matches performance of
state-of-the-art continuous actor-critic methods when learning from features or
pixels. We extend classical bandit examples from cooperative MARL to provide
intuition for how decoupled critics leverage state information to coordinate
joint optimization, and demonstrate surprisingly strong performance across a
variety of continuous control tasks.
Related papers
- Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions [18.643104368680593]
In reinforcement learning, off-policy actor-critic approaches like DDPG and TD3 are based on the deterministic policy gradient.
We introduce a new actor architecture that combines two simple insights: (i) use multiple actors and evaluate the Q-value maximizing action, and (ii) learn surrogates to the Q-function that are simpler to optimize with gradient-based methods.
arXiv Detail & Related papers (2024-10-15T17:58:03Z) - Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence [7.586600116278698]
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN)
arXiv Detail & Related papers (2023-06-10T10:04:54Z) - Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach [38.76462300149459]
We develop a Multi-objective Correction (MoCo) method for multi-objective gradient optimization.
The unique feature of our method is that it can guarantee convergence without increasing the non fairness gradient.
arXiv Detail & Related papers (2022-10-23T05:54:26Z) - TASAC: a twin-actor reinforcement learning framework with stochastic
policy for batch process control [1.101002667958165]
Reinforcement Learning (RL) wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context.
RL frameworks with actor-critic architecture have recently become popular for controlling systems where state and action spaces are continuous.
It has been shown that an ensemble of actor and critic networks further helps the agent learn better policies due to the enhanced exploration due to simultaneous policy learning.
arXiv Detail & Related papers (2022-04-22T13:00:51Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.