A Survey on Interpretable Reinforcement Learning
- URL: http://arxiv.org/abs/2112.13112v1
- Date: Fri, 24 Dec 2021 17:26:57 GMT
- Title: A Survey on Interpretable Reinforcement Learning
- Authors: Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang,
Jianye Hao and Wulong Liu
- Abstract summary: This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL)
We distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy)
We argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making.
- Score: 28.869513255570077
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although deep reinforcement learning has become a promising machine learning
approach for sequential decision-making problems, it is still not mature enough
for high-stake domains such as autonomous driving or medical applications. In
such contexts, a learned policy needs for instance to be interpretable, so that
it can be inspected before any deployment (e.g., for safety and verifiability
reasons). This survey provides an overview of various approaches to achieve
higher interpretability in reinforcement learning (RL). To that aim, we
distinguish interpretability (as a property of a model) and explainability (as
a post-hoc operation, with the intervention of a proxy) and discuss them in the
context of RL with an emphasis on the former notion. In particular, we argue
that interpretable RL may embrace different facets: interpretable inputs,
interpretable (transition/reward) models, and interpretable decision-making.
Based on this scheme, we summarize and analyze recent work related to
interpretable RL with an emphasis on papers published in the past 10 years. We
also discuss briefly some related research areas and point to some potential
promising research directions.
Related papers
- Towards a Research Community in Interpretable Reinforcement Learning: the InterpPol Workshop [7.630967411418269]
Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions.
Should explainable and interpretable agents be developed outside of domains where transparency is imperative?
How can we rigorously define and measure interpretability in policies, without user studies?
arXiv Detail & Related papers (2024-04-16T20:53:17Z) - The Mystery of In-Context Learning: A Comprehensive Survey on
Interpretation and Analysis [21.342945716103884]
In-context learning (ICL) capability enables large language models to excel in proficiency through demonstration examples.
In this paper, we present a thorough survey on the interpretation and analysis of in-context learning.
We believe that our work establishes the basis for further exploration into the interpretation of in-context learning.
arXiv Detail & Related papers (2023-11-01T02:40:42Z) - A Survey on Interpretable Cross-modal Reasoning [64.37362731950843]
Cross-modal reasoning (CMR) has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.
This survey delves into the realm of interpretable cross-modal reasoning (I-CMR)
This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR.
arXiv Detail & Related papers (2023-09-05T05:06:48Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Interpretability and Explainability: A Machine Learning Zoo Mini-tour [4.56877715768796]
Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences.
We emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art.
arXiv Detail & Related papers (2020-12-03T10:11:52Z) - An Investigation of Language Model Interpretability via Sentence Editing [5.492504126672887]
We re-purpose a sentence editing dataset as a testbed for interpretability of pre-trained language models (PLMs)
This enables us to conduct a systematic investigation on an array of questions regarding PLMs' interpretability.
The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales.
arXiv Detail & Related papers (2020-11-28T00:46:43Z) - Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for
Post-Hoc Interpretability [54.85658598523915]
We propose to have a concrete definition of interpretation before we could evaluate faithfulness of an interpretation.
We find that although interpretation methods perform differently under a certain evaluation metric, such a difference may not result from interpretation quality or faithfulness.
arXiv Detail & Related papers (2020-09-16T06:38:03Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.