Steady-State Error Compensation for Reinforcement Learning with Quadratic Rewards
- URL: http://arxiv.org/abs/2402.09075v2
- Date: Mon, 1 Apr 2024 02:09:32 GMT
- Title: Steady-State Error Compensation for Reinforcement Learning with Quadratic Rewards
- Authors: Liyao Wang, Zishun Zheng, Yuan Lin,
- Abstract summary: The selection of a reward function in Reinforcement Learning (RL) has garnered significant attention because of its impact on system performance.
This study proposes an approach that introduces an integral term.
By integrating this integral term into quadratic-type reward functions, the RL algorithm is adeptly tuned, augmenting the system's consideration of reward history.
- Score: 1.0725881801927162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The selection of a reward function in Reinforcement Learning (RL) has garnered significant attention because of its impact on system performance. Issues of significant steady-state errors often manifest when quadratic reward functions are employed. Although absolute-value-type reward functions alleviate this problem, they tend to induce substantial fluctuations in specific system states, leading to abrupt changes. In response to this challenge, this study proposes an approach that introduces an integral term. By integrating this integral term into quadratic-type reward functions, the RL algorithm is adeptly tuned, augmenting the system's consideration of reward history, and consequently alleviates concerns related to steady-state errors. Through experiments and performance evaluations on the Adaptive Cruise Control (ACC) and lane change models, we validate that the proposed method effectively diminishes steady-state errors and does not cause significant spikes in some system states.
Related papers
- The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret [64.04721528586747]
In reinforcement learning, specifying reward functions that capture the intended task can be very challenging.
In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret.
We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF.
arXiv Detail & Related papers (2024-06-22T06:43:51Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Robustness Verification of Deep Reinforcement Learning Based Control
Systems using Reward Martingales [13.069196356472272]
We present the first approach for robustness verification of DRL-based control systems by introducing reward martingales.
Our results provide provably quantitative certificates for the two questions.
We then show that reward martingales can be implemented and trained via neural networks, against different types of control policies.
arXiv Detail & Related papers (2023-12-15T11:16:47Z) - Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Doubly Robust Proximal Causal Learning for Continuous Treatments [56.05592840537398]
We propose a kernel-based doubly robust causal learning estimator for continuous treatments.
We show that its oracle form is a consistent approximation of the influence function.
We then provide a comprehensive convergence analysis in terms of the mean square error.
arXiv Detail & Related papers (2023-09-22T12:18:53Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - Efficient Reinforcement Learning with Impaired Observability: Learning
to Act with Delayed and Missing State Observations [92.25604137490168]
This paper introduces a theoretical investigation into efficient reinforcement learning in control systems.
We present algorithms and establish near-optimal regret upper and lower bounds, of the form $tildemathcalO(sqrtrm poly(H) SAK)$, for RL in the delayed and missing observation settings.
arXiv Detail & Related papers (2023-06-02T02:46:39Z) - The Effects of Reward Misspecification: Mapping and Mitigating
Misaligned Models [85.68751244243823]
Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied.
We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time.
We find instances of phase transitions: capability thresholds at which the agent's behavior qualitatively shifts, leading to a sharp decrease in the true reward.
arXiv Detail & Related papers (2022-01-10T18:58:52Z) - Learning Long-Term Reward Redistribution via Randomized Return
Decomposition [18.47810850195995]
We consider the problem formulation of episodic reinforcement learning with trajectory feedback.
It refers to an extreme delay of reward signals, in which the agent can only obtain one reward signal at the end of each trajectory.
We propose a novel reward redistribution algorithm, randomized return decomposition (RRD), to learn a proxy reward function for episodic reinforcement learning.
arXiv Detail & Related papers (2021-11-26T13:23:36Z) - Identifiability in inverse reinforcement learning [0.0]
Inverse reinforcement learning attempts to reconstruct the reward function in a Markov decision problem.
We provide a resolution to this non-identifiability for problems with entropy regularization.
arXiv Detail & Related papers (2021-06-07T10:35:52Z) - Assessment of Reward Functions for Reinforcement Learning Traffic Signal
Control under Real-World Limitations [0.0]
This paper compares the performance of agents using different reward functions in a simulation of a junction in Greater Manchester, UK.
We find that speed maximisation resulted in the lowest average waiting times across all demand levels, displaying significantly better performance than other rewards previously introduced in the literature.
arXiv Detail & Related papers (2020-08-26T15:47:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.