Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts
- URL: http://arxiv.org/abs/2006.05911v3
- Date: Thu, 18 Nov 2021 16:15:44 GMT
- Title: Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts
- Authors: Riad Akrour, Davide Tateo, Jan Peters
- Abstract summary: We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
- Score: 35.80418547105711
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has demonstrated its ability to solve high
dimensional tasks by leveraging non-linear function approximators. However,
these successes are mostly achieved by 'black-box' policies in simulated
domains. When deploying RL to the real world, several concerns regarding the
use of a 'black-box' policy might be raised. In order to make the learned
policies more transparent, we propose in this paper a policy iteration scheme
that retains a complex function approximator for its internal value predictions
but constrains the policy to have a concise, hierarchical, and human-readable
structure, based on a mixture of interpretable experts. Each expert selects a
primitive action according to a distance to a prototypical state. A key design
decision to keep such experts interpretable is to select the prototypical
states from trajectory data. The main technical contribution of the paper is to
address the challenges introduced by this non-differentiable prototypical state
selection procedure. Experimentally, we show that our proposed algorithm can
learn compelling policies on continuous action deep RL benchmarks, matching the
performance of neural network based policies, but returning policies that are
more amenable to human inspection than neural network or linear-in-feature
policies.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z) - Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems.
Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation.
We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Policy Supervectors: General Characterization of Agents by their
Behaviour [18.488655590845163]
We propose policy supervectors for characterizing agents by the distribution of states they visit.
Policy supervectors can characterize policies regardless of their design philosophy and scale to thousands of policies on a single workstation machine.
We demonstrate method's applicability by studying the evolution of policies during reinforcement learning, evolutionary training and imitation learning.
arXiv Detail & Related papers (2020-12-02T14:43:16Z) - Policy Evaluation Networks [50.53250641051648]
We introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding.
Our empirical results demonstrate that combining these three elements can produce policies that outperform those that generated the training data.
arXiv Detail & Related papers (2020-02-26T23:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.