Related papers: Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution

Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution

URL: http://arxiv.org/abs/2404.04253v1
Date: Fri, 5 Apr 2024 17:58:37 GMT
Title: Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution
Authors: Tim Seyde, Peter Werner, Wilko Schwarting, Markus Wulfmeier, Daniela Rus,
Abstract summary: In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
Score: 51.83951489847344
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favourable exploration characteristics while final performance does not visibly suffer in the absence of action penalization in line with optimal control theory. In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency, but action costs can be detrimental to exploration during early training. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution, taking advantage of recent results in decoupled Q-learning to scale our approach to high-dimensional action spaces up to dim(A) = 38. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.

Related papers

Confounding Robust Continuous Control via Automatic Reward Shaping [48.93769483870838]
We propose to automatically learn a reward shaping function for continuous control problems from offline datasets.<n>Our method builds upon the recently proposed causal Bellman equation to learn a tight upper bound on the optimal state values.<n>Our work marks a solid first step towards confounding robust continuous control from a causal perspective.
arXiv Detail & Related papers (2026-02-10T21:23:12Z)
Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management [1.3891530345631953]
We systematically investigate action smoothness regularization through higher-order derivative penalties.<n>Our work establishes higher-order action regularization as an effective bridge between RL optimization and operational constraints in energy-critical applications.
arXiv Detail & Related papers (2026-01-05T12:35:33Z)
Improving Generative Behavior Cloning via Self-Guidance and Adaptive Chunking [29.920087317401396]
Generative Behavior Cloning is a simple yet effective framework for robot learning.<n>We propose two novel techniques to enhance the consistency and reactivity of diffusion policies.<n>Our approach substantially improves GBC performance across a wide range of simulated and real-world robotic manipulation tasks.
arXiv Detail & Related papers (2025-10-14T11:16:34Z)
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection [56.66677293607114]
We propose Code-as-Monitor (CaM) for both open-set reactive and proactive failure detection. To enhance the accuracy and efficiency of monitoring, we introduce constraint elements that abstract constraint-related entities. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances.
arXiv Detail & Related papers (2024-12-05T18:58:27Z)
Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls [0.3441021278275805]
This paper analyzes using discrete action spaces, where the agent must choose from a predefined list of actions. Experiments are conducted for an inspection task, where the agent must circumnavigate an object to inspect points on its surface, and a docking task, where the agent must move into proximity of another spacecraft and "dock" A common objective of both tasks, and most space tasks in general, is to minimize fuel usage, which motivates the agent to regularly choose an action that uses no fuel.
arXiv Detail & Related papers (2024-05-20T20:06:54Z)
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z)
Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks [0.24578723416255746]
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability. We propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy.
arXiv Detail & Related papers (2024-02-04T15:54:03Z)
Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law. We approach both objectives by using reinforcement learning to compute the optimal control law. Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z)
Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z)
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies [45.20170713261535]
We investigate the phenomenon that trained agents often prefer actions at the boundaries of that space. We replace the normal Gaussian by a Bernoulli distribution that solely considers the extremes along each action dimension. Surprisingly, this achieves state-of-the-art performance on several continuous control benchmarks.
arXiv Detail & Related papers (2021-11-03T22:45:55Z)
Adaptive control of a mechatronic system using constrained residual reinforcement learning [0.0]
We propose a simple, practical and intuitive approach to improve the performance of a conventional controller in uncertain environments. Our approach is motivated by the observation that conventional controllers in industrial motion control value robustness over adaptivity to deal with different operating conditions.
arXiv Detail & Related papers (2021-10-06T08:13:05Z)
Regret-optimal Estimation and Control [52.28457815067461]
We show that the regret-optimal estimator and regret-optimal controller can be derived in state-space form. We propose regret-optimal analogs of Model-Predictive Control (MPC) and the Extended KalmanFilter (EKF) for systems with nonlinear dynamics.
arXiv Detail & Related papers (2021-06-22T23:14:21Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.