Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution
- URL: http://arxiv.org/abs/2404.04253v1
- Date: Fri, 5 Apr 2024 17:58:37 GMT
- Title: Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution
- Authors: Tim Seyde, Peter Werner, Wilko Schwarting, Markus Wulfmeier, Daniela Rus,
- Abstract summary: In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
- Score: 51.83951489847344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favourable exploration characteristics while final performance does not visibly suffer in the absence of action penalization in line with optimal control theory. In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency, but action costs can be detrimental to exploration during early training. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution, taking advantage of recent results in decoupled Q-learning to scale our approach to high-dimensional action spaces up to dim(A) = 38. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
Related papers
- Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls [0.3441021278275805]
This paper analyzes using discrete action spaces, where the agent must choose from a predefined list of actions.
Experiments are conducted for an inspection task, where the agent must circumnavigate an object to inspect points on its surface, and a docking task, where the agent must move into proximity of another spacecraft and "dock"
A common objective of both tasks, and most space tasks in general, is to minimize fuel usage, which motivates the agent to regularly choose an action that uses no fuel.
arXiv Detail & Related papers (2024-05-20T20:06:54Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks [0.24578723416255746]
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability.
We propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy.
arXiv Detail & Related papers (2024-02-04T15:54:03Z) - Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Is Bang-Bang Control All You Need? Solving Continuous Control with
Bernoulli Policies [45.20170713261535]
We investigate the phenomenon that trained agents often prefer actions at the boundaries of that space.
We replace the normal Gaussian by a Bernoulli distribution that solely considers the extremes along each action dimension.
Surprisingly, this achieves state-of-the-art performance on several continuous control benchmarks.
arXiv Detail & Related papers (2021-11-03T22:45:55Z) - Adaptive control of a mechatronic system using constrained residual
reinforcement learning [0.0]
We propose a simple, practical and intuitive approach to improve the performance of a conventional controller in uncertain environments.
Our approach is motivated by the observation that conventional controllers in industrial motion control value robustness over adaptivity to deal with different operating conditions.
arXiv Detail & Related papers (2021-10-06T08:13:05Z) - Regret-optimal Estimation and Control [52.28457815067461]
We show that the regret-optimal estimator and regret-optimal controller can be derived in state-space form.
We propose regret-optimal analogs of Model-Predictive Control (MPC) and the Extended KalmanFilter (EKF) for systems with nonlinear dynamics.
arXiv Detail & Related papers (2021-06-22T23:14:21Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.