Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization
- URL: http://arxiv.org/abs/2408.08761v4
- Date: Wed, 05 Feb 2025 07:41:14 GMT
- Title: Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization
- Authors: Sascha Marton, Tim Grams, Florian Vogt, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt,
- Abstract summary: We introduce SYMPOL, a novel method for SYMbolic tree-based on-policy RL.
SYMPOL employs a tree-based model integrated with a policy gradient method, enabling the agent to learn and adapt its actions.
We evaluate SYMPOL on a set of benchmark RL tasks, demonstrating its superiority over alternative tree-based RL approaches.
- Score: 9.035959289139102
- License:
- Abstract: Reinforcement learning (RL) has seen significant success across various domains, but its adoption is often limited by the black-box nature of neural network policies, making them difficult to interpret. In contrast, symbolic policies allow representing decision-making strategies in a compact and interpretable way. However, learning symbolic policies directly within on-policy methods remains challenging. In this paper, we introduce SYMPOL, a novel method for SYMbolic tree-based on-POLicy RL. SYMPOL employs a tree-based model integrated with a policy gradient method, enabling the agent to learn and adapt its actions while maintaining a high level of interpretability. We evaluate SYMPOL on a set of benchmark RL tasks, demonstrating its superiority over alternative tree-based RL approaches in terms of performance and interpretability. Unlike existing methods, it enables gradient-based, end-to-end learning of interpretable, axis-aligned decision trees within standard on-policy RL algorithms. Therefore, SYMPOL can become the foundation for a new class of interpretable RL based on decision trees. Our implementation is available under: https://github.com/s-marton/sympol
Related papers
- Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning [27.868175900131313]
Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state.
This paper postulates multi-linear mappings to efficiently estimate the parameters of the RL policy.
We leverage the PARAFAC decomposition to design tensor low-rank policies.
arXiv Detail & Related papers (2025-01-08T23:22:08Z) - Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies [51.03989561425833]
We propose a neuro-symbolic approach called neural DNF-MT for end-to-end policy learning.
The differentiable nature of the neural DNF-MT model enables the use of deep actor-critic algorithms for training.
We show how the bivalent representations of deterministic policies can be edited and incorporated back into a neural model.
arXiv Detail & Related papers (2025-01-07T15:51:49Z) - In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search [6.74890780471356]
We present an approach to synthesise optimal decision-tree policies given a deterministic black-box environment.
Our approach is a specialised search algorithm which systematically explores the space of decision trees under the given discretisation.
arXiv Detail & Related papers (2024-09-05T05:51:42Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Nash Learning from Human Feedback [86.09617990412941]
We introduce an alternative pipeline for the fine-tuning of large language models using pairwise human feedback.
We term this approach Nash learning from human feedback (NLHF)
We present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent.
arXiv Detail & Related papers (2023-12-01T19:26:23Z) - Efficient Symbolic Policy Learning with Differentiable Symbolic
Expression [30.855457609733637]
We propose an efficient gradient-based learning method that learns the symbolic policy from scratch in an end-to-end way.
In addition, in contrast with previous symbolic policies which only work in single-task RL because of complexity, we expand ESPL on meta-RL to generate symbolic policies for unseen tasks.
arXiv Detail & Related papers (2023-11-02T03:27:51Z) - Deep Explainable Relational Reinforcement Learning: A Neuro-Symbolic
Approach [18.38878415765146]
We propose Explainable Reinforcement Learning (DERRL), a framework that exploits the best of both -- neural and symbolic worlds.
DERRL combines relational representations and constraints from symbolic planning with deep learning to extract interpretable policies.
These policies are in the form of logical rules that explain how each decision (or action) is arrived at.
arXiv Detail & Related papers (2023-04-17T15:11:40Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards.
Many supervised and unsupervised RL problems are not covered in the Linear RL framework.
We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z) - Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.