Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2310.19137v1
- Date: Sun, 29 Oct 2023 19:59:55 GMT
- Title: Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep
Reinforcement Learning
- Authors: Suraj Singireddy, Andre Beckus, George Atia, Sumit Jha, Alvaro
Velasquez
- Abstract summary: Reinforcement learning (RL) is a powerful tool for finding optimal policies in sequential decision processes.
Deep RL methods suffer from two weaknesses: collecting the amount of agent experience required for practical RL problems is prohibitively expensive, and the learned policies exhibit poor generalization on tasks outside of the training distribution.
We introduce automaton distillation, a form of neuro-symbolic transfer learning in which Q-value estimates from a teacher are distilled into a low-dimensional representation in the form of an automaton.
- Score: 11.31386674125334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) is a powerful tool for finding optimal policies
in sequential decision processes. However, deep RL methods suffer from two
weaknesses: collecting the amount of agent experience required for practical RL
problems is prohibitively expensive, and the learned policies exhibit poor
generalization on tasks outside of the training distribution. To mitigate these
issues, we introduce automaton distillation, a form of neuro-symbolic transfer
learning in which Q-value estimates from a teacher are distilled into a
low-dimensional representation in the form of an automaton. We then propose two
methods for generating Q-value estimates: static transfer, which reasons over
an abstract Markov Decision Process constructed based on prior knowledge, and
dynamic transfer, where symbolic information is extracted from a teacher Deep
Q-Network (DQN). The resulting Q-value estimates from either method are used to
bootstrap learning in the target environment via a modified DQN loss function.
We list several failure modes of existing automaton-based transfer methods and
demonstrate that both static and dynamic automaton distillation decrease the
time required to find optimal policies for various decision tasks.
Related papers
- Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values [8.694989771294013]
Policy gradient methods can still be useful in many domains as long as we can wrangle with how to exploit them in a sample efficient way.
We explore the chaotic nature of DQNs in reinforcement learning, while understanding how the information that they retain when trained can be repurposed for adapting a model to different tasks.
arXiv Detail & Related papers (2024-07-14T21:28:27Z) - Equivariant Offline Reinforcement Learning [7.822389399560674]
We investigate the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations.
Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts.
arXiv Detail & Related papers (2024-06-20T03:02:49Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents [9.529492371336286]
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors.
We propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS)
LSTS learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification.
arXiv Detail & Related papers (2024-02-06T04:00:21Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent.
In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics.
We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z) - Gradient-Based Automated Iterative Recovery for Parameter-Efficient
Tuning [11.124310650599146]
We use TracIn to improve model performance in the parameter-efficient tuning (PET) setting.
We develop a new methodology for using gradient-based explainability techniques to improve model performance.
arXiv Detail & Related papers (2023-02-13T18:54:58Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates [110.92598350897192]
Q-Learning has proven effective at learning a policy to perform control tasks.
estimation noise becomes a bias after the max operator in the policy improvement step.
We present Unbiased Soft Q-Learning (UQL), which extends the work of EQL from two action, finite state spaces to multi-action, infinite state Markov Decision Processes.
arXiv Detail & Related papers (2021-10-28T00:07:19Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Induction and Exploitation of Subgoal Automata for Reinforcement
Learning [75.55324974788475]
We present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks.
ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals.
A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding.
arXiv Detail & Related papers (2020-09-08T16:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.