Boolean Decision Rules for Reinforcement Learning Policy Summarisation
- URL: http://arxiv.org/abs/2207.08651v1
- Date: Mon, 18 Jul 2022 14:51:24 GMT
- Title: Boolean Decision Rules for Reinforcement Learning Policy Summarisation
- Authors: James McCarthy, Rahul Nair, Elizabeth Daly, Radu Marinescu, Ivana
Dusparic
- Abstract summary: We create a rule-based summary of an agent's policy using a lava gridworld.
We discuss possible avenues to introduce safety into a RL agent's policy by using rules generated by this rule-based model as constraints imposed on the agent's policy.
- Score: 16.969788244589388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainability of Reinforcement Learning (RL) policies remains a challenging
research problem, particularly when considering RL in a safety context.
Understanding the decisions and intentions of an RL policy offer avenues to
incorporate safety into the policy by limiting undesirable actions. We propose
the use of a Boolean Decision Rules model to create a post-hoc rule-based
summary of an agent's policy. We evaluate our proposed approach using a DQN
agent trained on an implementation of a lava gridworld and show that, given a
hand-crafted feature representation of this gridworld, simple generalised rules
can be created, giving a post-hoc explainable summary of the agent's policy. We
discuss possible avenues to introduce safety into a RL agent's policy by using
rules generated by this rule-based model as constraints imposed on the agent's
policy, as well as discuss how creating simple rule summaries of an agent's
policy may help in the debugging process of RL agents.
Related papers
- SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL [54.022106606140774]
We present theoretical results that provide a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setup.
We also present SPoRt, which enables the user to trade off safety guarantees in exchange for task-specific performance.
arXiv Detail & Related papers (2025-04-08T19:09:07Z) - Residual Policy Gradient: A Reward View of KL-regularized Objective [48.39829592175419]
Reinforcement Learning and Imitation Learning have achieved widespread success in many domains but remain constrained during real-world deployment.
Policy customization has been introduced, aiming to adapt a prior policy while preserving its inherent properties and meeting new task-specific requirements.
A principled approach to policy customization is Residual Q-Learning (RQL), which formulates the problem as a Markov Decision Process (MDP) and derives a family of value-based learning algorithms.
We introduce Residual Policy Gradient (RPG), which extends RQL to policy gradient methods, allowing policy customization in gradient-based RL settings.
arXiv Detail & Related papers (2025-03-14T02:30:13Z) - Rule-Guided Reinforcement Learning Policy Evaluation and Improvement [9.077163856137505]
LEGIBLE is a novel approach to improving deep reinforcement learning policies.
It starts by mining rules from a deep RL policy, constituting a partially symbolic representation.
In the second step, we generalize the mined rules using domain knowledge expressed as metamorphic relations.
The third step is evaluating generalized rules to determine which generalizations improve performance when enforced.
arXiv Detail & Related papers (2025-03-12T11:13:08Z) - Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates [0.0]
Designers need tools to automatically reason about what agents ought to do, how that conflicts with what is actually happening, and how a policy might be modified to remove the conflict.
We propose a new deontic logic, Expected Act Utilitarian deontic logic, for enabling this reasoning at design time.
Unlike approaches that work at the reward level, working at the logical level increases the transparency of the trade-offs.
arXiv Detail & Related papers (2024-07-31T20:21:15Z) - Policy Bifurcation in Safe Reinforcement Learning [35.75059015441807]
In some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous local optima can inevitably lead to constraint violations.
We are the first to identify the generating mechanism of such a phenomenon, and employ topological analysis to rigorously prove the existence of bifurcation in safe RL.
We propose a safe RL algorithm called multimodal policy optimization (MUPO), which utilizes a Gaussian mixture distribution as the policy output.
arXiv Detail & Related papers (2024-03-19T15:54:38Z) - Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z) - Offline Reinforcement Learning with On-Policy Q-Function Regularization [57.09073809901382]
We deal with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy.
We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.
arXiv Detail & Related papers (2023-07-25T21:38:08Z) - Counterfactual Explanation Policies in RL [3.674863913115432]
COUNTERPOL is the first framework to analyze Reinforcement Learning policies using counterfactual explanations.
We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL.
arXiv Detail & Related papers (2023-07-25T01:14:56Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - Supported Policy Optimization for Offline Reinforcement Learning [74.1011309005488]
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization.
Regularization methods reduce the divergence between the learned policy and the behavior policy.
This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint.
arXiv Detail & Related papers (2022-02-13T07:38:36Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z) - Reinforcement Learning [36.664136621546575]
Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains.
In this chapter, we present the basic framework of RL and recall the two main families of approaches that have been developed to learn a good policy.
arXiv Detail & Related papers (2020-05-29T06:53:29Z) - BRPO: Batch Residual Policy Optimization [79.53696635382592]
In batch reinforcement learning, one often constrains a learned policy to be close to the behavior (data-generating) policy.
We propose residual policies, where the allowable deviation of the learned policy is state-action-dependent.
We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance.
arXiv Detail & Related papers (2020-02-08T01:59:33Z) - Preventing Imitation Learning with Adversarial Policy Ensembles [79.81807680370677]
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy.
How can we protect against external observers cloning our proprietary policies?
We introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies.
arXiv Detail & Related papers (2020-01-31T01:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.