Programmatic Policy Extraction by Iterative Local Search
- URL: http://arxiv.org/abs/2201.06863v1
- Date: Tue, 18 Jan 2022 10:39:40 GMT
- Title: Programmatic Policy Extraction by Iterative Local Search
- Authors: Rasmus Larsen, Mikkel N{\o}rgaard Schmidt
- Abstract summary: We present a simple and direct approach to extracting a programmatic policy from a pretrained neural policy.
Both when trained using a hand crafted expert policy and a learned neural policy, our method discovers simple and interpretable policies that perform almost as well as the original.
- Score: 0.15229257192293197
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Reinforcement learning policies are often represented by neural networks, but
programmatic policies are preferred in some cases because they are more
interpretable, amenable to formal verification, or generalize better. While
efficient algorithms for learning neural policies exist, learning programmatic
policies is challenging. Combining imitation-projection and dataset aggregation
with a local search heuristic, we present a simple and direct approach to
extracting a programmatic policy from a pretrained neural policy. After
examining our local search heuristic on a programming by example problem, we
demonstrate our programmatic policy extraction method on a pendulum swing-up
problem. Both when trained using a hand crafted expert policy and a learned
neural policy, our method discovers simple and interpretable policies that
perform almost as well as the original.
Related papers
- Language-Conditioned Semantic Search-Based Policy for Robotic
Manipulation Tasks [2.1332830068386217]
We propose a language-conditioned semantic search-based method to produce an online search-based policy.
Our approach surpasses the performance of the baselines on the CALVIN benchmark and exhibits strong zero-shot adaptation capabilities.
arXiv Detail & Related papers (2023-12-10T16:17:00Z) - Synthesizing Programmatic Policies with Actor-Critic Algorithms and ReLU
Networks [20.2777559515384]
Programmatically Interpretable Reinforcement Learning (PIRL) encodes policies in human-readable computer programs.
In this paper, we show that PIRL-specific algorithms are not needed, depending on the language used to encode the programmatic policies.
We use a connection between ReLU neural networks and oblique decision trees to translate the policy learned with actor-critic algorithms into programmatic policies.
arXiv Detail & Related papers (2023-08-04T22:17:32Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Learning to Synthesize Programs as Interpretable and Generalizable
Policies [25.258598215642067]
We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner.
Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
arXiv Detail & Related papers (2021-08-31T07:03:06Z) - Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z) - Policy Evaluation Networks [50.53250641051648]
We introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding.
Our empirical results demonstrate that combining these three elements can produce policies that outperform those that generated the training data.
arXiv Detail & Related papers (2020-02-26T23:00:27Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.