On Reducing Undesirable Behavior in Deep Reinforcement Learning Models
- URL: http://arxiv.org/abs/2309.02869v2
- Date: Mon, 11 Sep 2023 14:09:36 GMT
- Title: On Reducing Undesirable Behavior in Deep Reinforcement Learning Models
- Authors: Ophir M. Carmel, Guy Katz
- Abstract summary: We propose a novel framework aimed at drastically reducing the undesirable behavior of DRL-based software.
Our framework can assist in providing engineers with a comprehensible characterization of such undesirable behavior.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL) has proven extremely useful in a large
variety of application domains. However, even successful DRL-based software can
exhibit highly undesirable behavior. This is due to DRL training being based on
maximizing a reward function, which typically captures general trends but
cannot precisely capture, or rule out, certain behaviors of the system. In this
paper, we propose a novel framework aimed at drastically reducing the
undesirable behavior of DRL-based software, while maintaining its excellent
performance. In addition, our framework can assist in providing engineers with
a comprehensible characterization of such undesirable behavior. Under the hood,
our approach is based on extracting decision tree classifiers from erroneous
state-action pairs, and then integrating these trees into the DRL training
loop, penalizing the system whenever it performs an error. We provide a
proof-of-concept implementation of our approach, and use it to evaluate the
technique on three significant case studies. We find that our approach can
extend existing frameworks in a straightforward manner, and incurs only a
slight overhead in training time. Further, it incurs only a very slight hit to
performance, or even in some cases - improves it, while significantly reducing
the frequency of undesirable behavior.
Related papers
- Human-Readable Programs as Actors of Reinforcement Learning Agents Using Critic-Moderated Evolution [4.831084635928491]
We build on TD3 and use its critics as the basis of the objective function of a genetic algorithm that syntheses the program.
Our approach steers the program to actual high rewards, instead of a simple Mean Squared Error.
arXiv Detail & Related papers (2024-10-29T10:57:33Z) - UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning [10.593924216046977]
We first theoretically analyze overestimation phenomenon led by MSE and provide the theoretical upper bound of the overestimated error.
At last, we propose the offline RL algorithm based on underestimated operator and diffusion policy model.
arXiv Detail & Related papers (2024-06-05T14:37:42Z) - Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning [10.117626902557927]
Current Reinforcement Learning (RL) methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications.
In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF) to reduce the sample complexity.
We show that our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.
arXiv Detail & Related papers (2024-03-18T19:51:17Z) - Solving Offline Reinforcement Learning with Decision Tree Regression [0.0]
This study presents a novel approach to addressing offline reinforcement learning problems by reframing them as regression tasks.
We introduce two distinct frameworks: return-conditioned and return-weighted decision tree policies.
Despite the simplification inherent in this reformulated approach to offline RL, our agents demonstrate performance that is at least on par with the established methods.
arXiv Detail & Related papers (2024-01-21T23:50:46Z) - Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs)
We propose a new RL method named RLMEC that incorporates a generative model as the reward model.
Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Flow to Control: Offline Reinforcement Learning with Lossless Primitive
Discovery [31.49638957903016]
offline reinforcement learning (RL) enables the agent to effectively learn from logged data.
We show that our method has a good representation ability for policies and achieves superior performance in most tasks.
arXiv Detail & Related papers (2022-12-02T11:35:51Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - DR3: Value-Based Deep Reinforcement Learning Requires Explicit
Regularization [125.5448293005647]
We discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL.
Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions.
We propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer.
arXiv Detail & Related papers (2021-12-09T06:01:01Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.