Efficient Symbolic Policy Learning with Differentiable Symbolic
Expression
- URL: http://arxiv.org/abs/2311.02104v1
- Date: Thu, 2 Nov 2023 03:27:51 GMT
- Title: Efficient Symbolic Policy Learning with Differentiable Symbolic
Expression
- Authors: Jiaming Guo, Rui Zhang, Shaohui Peng, Qi Yi, Xing Hu, Ruizhi Chen,
Zidong Du, Xishan Zhang, Ling Li, Qi Guo, Yunji Chen
- Abstract summary: We propose an efficient gradient-based learning method that learns the symbolic policy from scratch in an end-to-end way.
In addition, in contrast with previous symbolic policies which only work in single-task RL because of complexity, we expand ESPL on meta-RL to generate symbolic policies for unseen tasks.
- Score: 30.855457609733637
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep reinforcement learning (DRL) has led to a wide range of advances in
sequential decision-making tasks. However, the complexity of neural network
policies makes it difficult to understand and deploy with limited computational
resources. Currently, employing compact symbolic expressions as symbolic
policies is a promising strategy to obtain simple and interpretable policies.
Previous symbolic policy methods usually involve complex training processes and
pre-trained neural network policies, which are inefficient and limit the
application of symbolic policies. In this paper, we propose an efficient
gradient-based learning method named Efficient Symbolic Policy Learning (ESPL)
that learns the symbolic policy from scratch in an end-to-end way. We introduce
a symbolic network as the search space and employ a path selector to find the
compact symbolic policy. By doing so we represent the policy with a
differentiable symbolic expression and train it in an off-policy manner which
further improves the efficiency. In addition, in contrast with previous
symbolic policies which only work in single-task RL because of complexity, we
expand ESPL on meta-RL to generate symbolic policies for unseen tasks.
Experimentally, we show that our approach generates symbolic policies with
higher performance and greatly improves data efficiency for single-task RL. In
meta-RL, we demonstrate that compared with neural network policies the proposed
symbolic policy achieves higher performance and efficiency and shows the
potential to be interpretable.
Related papers
- SYMPOL: Symbolic Tree-Based On-Policy Reinforcement Learning [9.035959289139102]
We introduce SYMPOL, a novel method for SYMbolic tree-based on-POLicy RL.
SYMPOL employs a tree-based model integrated with a policy gradient method, enabling the agent to learn and adapt its actions.
We evaluate SYMPOL on a set of benchmark RL tasks, demonstrating its superiority over alternative tree-based RL approaches.
arXiv Detail & Related papers (2024-08-16T14:04:40Z) - End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations [15.530907808235945]
We present a neuro-symbolic framework for jointly learning structured states and symbolic policies.
We design a pipeline to prompt GPT-4 to generate textual explanations for the learned policies and decisions.
We verify the efficacy of our approach on nine Atari tasks and present GPT-generated explanations for policies and decisions.
arXiv Detail & Related papers (2024-03-19T05:21:20Z) - Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z) - Symbolic Visual Reinforcement Learning: A Scalable Framework with
Object-Level Abstraction and Differentiable Expression Search [63.3745291252038]
We propose DiffSES, a novel symbolic learning approach that discovers discrete symbolic policies.
By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions.
Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more scalable than state-of-the-art symbolic RL methods.
arXiv Detail & Related papers (2022-12-30T17:50:54Z) - Symbolic Distillation for Learned TCP Congestion Control [70.27367981153299]
TCP congestion control has achieved tremendous success with deep reinforcement learning (RL) approaches.
Black-box policies lack interpretability and reliability, and often, they need to operate outside the traditional TCP datapath.
This paper proposes a novel two-stage solution to achieve the best of both worlds: first, to train a deep RL agent, then distill its NN policy into white-box, light-weight rules.
arXiv Detail & Related papers (2022-10-24T00:58:16Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges.
MerPO learns a meta-model for efficient task structure inference and an informative meta-policy.
We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z) - Neuro-Symbolic Reinforcement Learning with First-Order Logic [63.003353499732434]
We propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network.
Our experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark.
arXiv Detail & Related papers (2021-10-21T08:21:49Z) - Neurosymbolic Reinforcement Learning with Formally Verified Exploration [21.23874800091344]
We present Revel, a framework for provably safe exploration in continuous state and action spaces.
A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible.
We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification.
arXiv Detail & Related papers (2020-09-26T14:51:04Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.