Related papers: Adviser-Actor-Critic: Eliminating Steady-State Error in Reinforcement Learning Control

Adviser-Actor-Critic: Eliminating Steady-State Error in Reinforcement Learning Control

URL: http://arxiv.org/abs/2502.02265v1
Date: Tue, 04 Feb 2025 12:26:47 GMT
Title: Adviser-Actor-Critic: Eliminating Steady-State Error in Reinforcement Learning Control
Authors: Donghe Chen, Yubin Peng, Tengjie Zheng, Han Wang, Chaoran Qu, Lin Cheng,
Abstract summary: We introduce Adviser-Actor-Critic (AAC), designed to address the precision control dilemma.<n>AAC features an Adviser that mentors the actor to refine control actions, thereby enhancing the precision of goal attainment.<n>AAC outperformed standard RL algorithms in precision-critical, goal-conditioned tasks.
Score: 5.467233817126651
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-precision control tasks present substantial challenges for reinforcement learning (RL) algorithms, frequently resulting in suboptimal performance attributed to network approximation inaccuracies and inadequate sample quality.These issues are exacerbated when the task requires the agent to achieve a precise goal state, as is common in robotics and other real-world applications.We introduce Adviser-Actor-Critic (AAC), designed to address the precision control dilemma by combining the precision of feedback control theory with the adaptive learning capability of RL and featuring an Adviser that mentors the actor to refine control actions, thereby enhancing the precision of goal attainment.Finally, through benchmark tests, AAC outperformed standard RL algorithms in precision-critical, goal-conditioned tasks, demonstrating AAC's high precision, reliability, and robustness.Code are available at: https://anonymous.4open.science/r/Adviser-Actor-Critic-8AC5.

Related papers

Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic [12.837649598521102]
This paper introduces the Q-guided STein variational model predictive Actor-Critic (Q-STAC) framework for continuous control tasks.<n>Our method optimize control sequences directly using learned Q-values as objectives, eliminating the need for explicit cost function design.<n>Experiments on 2D navigation and robotic manipulation tasks demonstrate that Q-STAC achieves superior sample efficiency, robustness, and optimality compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2025-07-09T07:53:53Z)
Automatically Adaptive Conformal Risk Control [49.95190019041905]
We propose a methodology for achieving approximate conditional control of statistical risks by adapting to the difficulty of test samples. Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning.
arXiv Detail & Related papers (2024-06-25T08:29:32Z)
Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z)
Reinforcement Learning with Elastic Time Steps [14.838483990647697]
Multi-Objective Soft Elastic Actor-Critic (MOSEAC) is an off-policy actor-critic algorithm that uses elastic time steps to dynamically adjust the control frequency. We show that MOSEAC converges and produces stable policies at the theoretical level, and validate our findings in a real-time 3D racing game.
arXiv Detail & Related papers (2024-02-22T20:49:04Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Distributional Soft Actor-Critic with Three Refinements [47.46661939652862]
Reinforcement learning (RL) has shown remarkable success in solving complex decision-making and control tasks.<n>Many model-free RL algorithms experience performance degradation due to inaccurate value estimation.<n>This paper introduces three key refinements to DSACv1 to overcome these limitations and further improve Q-value estimation accuracy.
arXiv Detail & Related papers (2023-10-09T16:52:48Z)
Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law. We approach both objectives by using reinforcement learning to compute the optimal control law. Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Statistically Characterising Robustness and Fidelity of Quantum Controls and Quantum Control Algorithms [0.5599792629509229]
The robustness-infidelity measure (RIM$_p$) is introduced to quantify the robustness and fidelity of a controller. Based on the RIM$_p$, an algorithmic robustness-infidelity measure (ARIM) is developed to quantify the expected robustness and fidelity of controllers.
arXiv Detail & Related papers (2022-07-16T01:19:57Z)
Adaptable Automation with Modular Deep Reinforcement Learning and Policy Transfer [8.299945169799795]
This article develops and tests a Hyper-Actor Soft Actor-Critic (HASAC) RL framework based on the notions of task modularization and transfer learning. The HASAC framework is tested on a new virtual robotic manipulation benchmark, Meta-World. Numerical experiments show superior performance by HASAC over state-of-the-art deep RL algorithms in terms of reward value, success rate, and task completion time.
arXiv Detail & Related papers (2020-11-27T03:09:05Z)
Constrained Model-Free Reinforcement Learning for Process Optimization [0.0]
Reinforcement learning (RL) is a control approach that can handle nonlinear optimal control problems. Despite the promise exhibited, RL has yet to see marked translation to industrial practice. We propose an 'oracle'-assisted constrained Q-learning algorithm that guarantees the satisfaction of joint chance constraints with a high probability.
arXiv Detail & Related papers (2020-11-16T13:16:22Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.