Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2310.19137v1
- Date: Sun, 29 Oct 2023 19:59:55 GMT
- Title: Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep
Reinforcement Learning
- Authors: Suraj Singireddy, Andre Beckus, George Atia, Sumit Jha, Alvaro
Velasquez
- Abstract summary: Reinforcement learning (RL) is a powerful tool for finding optimal policies in sequential decision processes.
Deep RL methods suffer from two weaknesses: collecting the amount of agent experience required for practical RL problems is prohibitively expensive, and the learned policies exhibit poor generalization on tasks outside of the training distribution.
We introduce automaton distillation, a form of neuro-symbolic transfer learning in which Q-value estimates from a teacher are distilled into a low-dimensional representation in the form of an automaton.
- Score: 11.31386674125334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) is a powerful tool for finding optimal policies
in sequential decision processes. However, deep RL methods suffer from two
weaknesses: collecting the amount of agent experience required for practical RL
problems is prohibitively expensive, and the learned policies exhibit poor
generalization on tasks outside of the training distribution. To mitigate these
issues, we introduce automaton distillation, a form of neuro-symbolic transfer
learning in which Q-value estimates from a teacher are distilled into a
low-dimensional representation in the form of an automaton. We then propose two
methods for generating Q-value estimates: static transfer, which reasons over
an abstract Markov Decision Process constructed based on prior knowledge, and
dynamic transfer, where symbolic information is extracted from a teacher Deep
Q-Network (DQN). The resulting Q-value estimates from either method are used to
bootstrap learning in the target environment via a modified DQN loss function.
We list several failure modes of existing automaton-based transfer methods and
demonstrate that both static and dynamic automaton distillation decrease the
time required to find optimal policies for various decision tasks.
Related papers
- DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation [10.645244994430483]
We propose a novel offline reinforcement learning (offline RL) approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation framework.
We leverage diffusion models to learn state-action sequence distributions and incorporate value functions for more balanced and adaptive decision-making.
As demonstrated in tasks like Maze2D, AntMaze, and Kitchen, DIAR consistently outperforms state-of-the-art algorithms in long-horizon, sparse-reward environments.
arXiv Detail & Related papers (2024-10-15T07:09:56Z) - Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values [8.694989771294013]
Policy gradient methods can still be useful in many domains as long as we can wrangle with how to exploit them in a sample efficient way.
We explore the chaotic nature of DQNs in reinforcement learning, while understanding how the information that they retain when trained can be repurposed for adapting a model to different tasks.
arXiv Detail & Related papers (2024-07-14T21:28:27Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents [9.529492371336286]
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors.
We propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS)
LSTS learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification.
arXiv Detail & Related papers (2024-02-06T04:00:21Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent.
In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics.
We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z) - Extreme Q-Learning: MaxEnt RL without Entropy [88.97516083146371]
Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains.
We introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT)
Using EVT, we derive our Extreme Q-Learning framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms.
arXiv Detail & Related papers (2023-01-05T23:14:38Z) - Learning Task Automata for Reinforcement Learning using Hidden Markov
Models [37.69303106863453]
This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state task automata'
We learn a product MDP, a model composed of the specification's automaton and the environment's MDP, by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models.
Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy.
arXiv Detail & Related papers (2022-08-25T02:58:23Z) - Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates [110.92598350897192]
Q-Learning has proven effective at learning a policy to perform control tasks.
estimation noise becomes a bias after the max operator in the policy improvement step.
We present Unbiased Soft Q-Learning (UQL), which extends the work of EQL from two action, finite state spaces to multi-action, infinite state Markov Decision Processes.
arXiv Detail & Related papers (2021-10-28T00:07:19Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.