Towards Interpretable-AI Policies Induction using Evolutionary Nonlinear
Decision Trees for Discrete Action Systems
- URL: http://arxiv.org/abs/2009.09521v2
- Date: Tue, 6 Apr 2021 17:28:51 GMT
- Title: Towards Interpretable-AI Policies Induction using Evolutionary Nonlinear
Decision Trees for Discrete Action Systems
- Authors: Yashesh Dhebar, Kalyanmoy Deb, Subramanya Nageshrao, Ling Zhu and
Dimitar Filev
- Abstract summary: We use a recently proposed nonlinear decision-tree (NLDT) approach to find a hierarchical set of control rules.
We find relatively simple and interpretable rules involving one to four non-linear terms per rule, while simultaneously achieving on par closed-loop performance.
- Score: 8.322816790979285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Black-box AI induction methods such as deep reinforcement learning (DRL) are
increasingly being used to find optimal policies for a given control task.
Although policies represented using a black-box AI are capable of efficiently
executing the underlying control task and achieving optimal closed-loop
performance, the developed control rules are often complex and neither
interpretable nor explainable. In this paper, we use a recently proposed
nonlinear decision-tree (NLDT) approach to find a hierarchical set of control
rules in an attempt to maximize the open-loop performance for approximating and
explaining the pre-trained black-box DRL (oracle) agent using the labelled
state-action dataset. Recent advances in nonlinear optimization approaches
using evolutionary computation facilitates finding a hierarchical set of
nonlinear control rules as a function of state variables using a
computationally fast bilevel optimization procedure at each node of the
proposed NLDT. Additionally, we propose a re-optimization procedure for
enhancing closed-loop performance of an already derived NLDT. We evaluate our
proposed methodologies (open and closed-loop NLDTs) on different control
problems having multiple discrete actions. In all these problems our proposed
approach is able to find relatively simple and interpretable rules involving
one to four non-linear terms per rule, while simultaneously achieving on par
closed-loop performance when compared to a trained black-box DRL agent. A
post-processing approach for simplifying the NLDT is also suggested. The
obtained results are inspiring as they suggest the replacement of complicated
black-box DRL policies involving thousands of parameters (making them
non-interpretable) with relatively simple interpretable policies. Results are
encouraging and motivating to pursue further applications of proposed approach
in solving more complex control tasks.
Related papers
- FlowPG: Action-constrained Policy Gradient with Normalizing Flows [14.98383953401637]
Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical resource-alential related decision making problems.
A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each step.
arXiv Detail & Related papers (2024-02-07T11:11:46Z) - Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming
for Policy Optimization in Mixed Discrete-Continuous MDPs [23.87856533426793]
CGPO provides bounded policy error guarantees over an infinite range of initial states for many DC-MDPs with expressive nonlinear dynamics.
CGPO can generate worst-case state trajectories to diagnose policy deficiencies and provide counterfactual explanations of optimal actions.
We experimentally demonstrate the applicability of CGPO in diverse domains, including inventory control, management of a system of water reservoirs.
arXiv Detail & Related papers (2024-01-20T07:12:57Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - A Policy Efficient Reduction Approach to Convex Constrained Deep
Reinforcement Learning [2.811714058940267]
We propose a new variant of the conditional gradient (CG) type algorithm, which generalizes the minimum norm point (MNP) method.
Our method reduces the memory costs by an order of magnitude, and achieves better performance, demonstrating both its effectiveness and efficiency.
arXiv Detail & Related papers (2021-08-29T20:51:32Z) - Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks [59.419152768018506]
We show that any optimal policy necessarily satisfies the k-SP constraint.
We propose a novel cost function that penalizes the policy violating SP constraint, instead of completely excluding it.
Our experiments on MiniGrid, DeepMind Lab, Atari, and Fetch show that the proposed method significantly improves proximal policy optimization (PPO)
arXiv Detail & Related papers (2021-07-13T21:39:21Z) - Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement
Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications.
We propose a learning algorithm that decouples the action constraints from the policy parameter update.
We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Learning Constrained Adaptive Differentiable Predictive Control Policies
With Guarantees [1.1086440815804224]
We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems.
We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model.
arXiv Detail & Related papers (2020-04-23T14:24:44Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.