Related papers: "So, Tell Me About Your Policy...": Distillation of interpretable policies from Deep Reinforcement Learning agents

"So, Tell Me About Your Policy...": Distillation of interpretable policies from Deep Reinforcement Learning agents

URL: http://arxiv.org/abs/2507.07848v2
Date: Tue, 29 Jul 2025 08:20:12 GMT
Title: "So, Tell Me About Your Policy...": Distillation of interpretable policies from Deep Reinforcement Learning agents
Authors: Giovanni Dispoto, Paolo Bonetti, Marcello Restelli,
Abstract summary: We propose a novel algorithm that can extract an interpretable policy without disregarding the peculiarities of expert behavior.<n>In contrast to previous works, our approach enables the training of an interpretable policy using previously collected experience.<n>The proposed algorithm is empirically evaluated on classic control environments and on a financial trading scenario.
Score: 37.18643811339418
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Reinforcement Learning (RL) largely benefit from the inclusion of Deep Neural Networks, boosting the number of novel approaches proposed in the field of Deep Reinforcement Learning (DRL). These techniques demonstrate the ability to tackle complex games such as Atari, Go, and other real-world applications, including financial trading. Nevertheless, a significant challenge emerges from the lack of interpretability, particularly when attempting to comprehend the underlying patterns learned, the relative importance of the state features, and how they are integrated to generate the policy's output. For this reason, in mission-critical and real-world settings, it is often preferred to deploy a simpler and more interpretable algorithm, although at the cost of performance. In this paper, we propose a novel algorithm, supported by theoretical guarantees, that can extract an interpretable policy (e.g., a linear policy) without disregarding the peculiarities of expert behavior. This result is obtained by considering the advantage function, which includes information about why an action is superior to the others. In contrast to previous works, our approach enables the training of an interpretable policy using previously collected experience. The proposed algorithm is empirically evaluated on classic control environments and on a financial trading scenario, demonstrating its ability to extract meaningful information from complex expert policies.

Related papers

From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation [2.08099858257632]
We propose a novel model-agnostic approach to transform complex deep RL policies into transparent representations.<n>We evaluate our approach with three existing deep RL algorithms and validate its performance in two classic control environments.
arXiv Detail & Related papers (2025-01-16T22:11:03Z)
PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods [0.0]
We introduce PG-Rainbow, a novel algorithm that incorporates a distributional reinforcement learning framework with a policy gradient algorithm. We show empirical results that through the integration of reward distribution information into the policy network, the policy agent acquires enhanced capabilities.
arXiv Detail & Related papers (2024-07-18T04:18:52Z)
Amortized nonmyopic active search via deep imitation learning [16.037812098340343]
Active search formalizes a specialized active learning setting where the goal is to collect members of a rare, valuable class. We study the amortization of this policy by training a neural network to learn to search. Our network, trained on synthetic data, learns a beneficial search strategy that yields nonmyopic decisions.
arXiv Detail & Related papers (2024-05-23T20:10:29Z)
Representation-Driven Reinforcement Learning [57.44609759155611]
We present a representation-driven framework for reinforcement learning. By representing policies as estimates of their expected values, we leverage techniques from contextual bandits to guide exploration and exploitation. We demonstrate the effectiveness of this framework through its application to evolutionary and policy gradient-based approaches.
arXiv Detail & Related papers (2023-05-31T14:59:12Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks. We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit. We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner. Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z)
Continuous Action Reinforcement Learning from a Mixture of Interpretable Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.