A Survey on Interpretable Reinforcement Learning
- URL: http://arxiv.org/abs/2112.13112v1
- Date: Fri, 24 Dec 2021 17:26:57 GMT
- Title: A Survey on Interpretable Reinforcement Learning
- Authors: Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang,
Jianye Hao and Wulong Liu
- Abstract summary: This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL)
We distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy)
We argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making.
- Score: 28.869513255570077
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although deep reinforcement learning has become a promising machine learning
approach for sequential decision-making problems, it is still not mature enough
for high-stake domains such as autonomous driving or medical applications. In
such contexts, a learned policy needs for instance to be interpretable, so that
it can be inspected before any deployment (e.g., for safety and verifiability
reasons). This survey provides an overview of various approaches to achieve
higher interpretability in reinforcement learning (RL). To that aim, we
distinguish interpretability (as a property of a model) and explainability (as
a post-hoc operation, with the intervention of a proxy) and discuss them in the
context of RL with an emphasis on the former notion. In particular, we argue
that interpretable RL may embrace different facets: interpretable inputs,
interpretable (transition/reward) models, and interpretable decision-making.
Based on this scheme, we summarize and analyze recent work related to
interpretable RL with an emphasis on papers published in the past 10 years. We
also discuss briefly some related research areas and point to some potential
promising research directions.
Related papers
- A Comprehensive Survey on Evidential Deep Learning and Its Applications [64.83473301188138]
Evidential Deep Learning (EDL) provides reliable uncertainty estimation with minimal additional computation in a single forward pass.
We first delve into the theoretical foundation of EDL, the subjective logic theory, and discuss its distinctions from other uncertainty estimation frameworks.
We elaborate on its extensive applications across various machine learning paradigms and downstream tasks.
arXiv Detail & Related papers (2024-09-07T05:55:06Z) - Demystifying Reinforcement Learning in Production Scheduling via Explainable AI [0.7515066610159392]
Deep Reinforcement Learning (DRL) is a frequently employed technique to solve scheduling problems.
Although DRL agents ace at delivering viable results in short computing times, their reasoning remains opaque.
We apply two explainable AI (xAI) frameworks to describe the reasoning behind scheduling decisions of a specialized DRL agent in a flow production.
arXiv Detail & Related papers (2024-08-19T09:39:01Z) - Towards a Research Community in Interpretable Reinforcement Learning: the InterpPol Workshop [7.630967411418269]
Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions.
Should explainable and interpretable agents be developed outside of domains where transparency is imperative?
How can we rigorously define and measure interpretability in policies, without user studies?
arXiv Detail & Related papers (2024-04-16T20:53:17Z) - The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis [20.142154624977582]
In-context learning (ICL) capability enables large language models to excel in proficiency through demonstration examples.
In this paper, we present a thorough survey on the interpretation and analysis of in-context learning.
We believe that our work establishes the basis for further exploration into the interpretation of in-context learning.
arXiv Detail & Related papers (2023-11-01T02:40:42Z) - A Survey on Interpretable Cross-modal Reasoning [64.37362731950843]
Cross-modal reasoning (CMR) has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.
This survey delves into the realm of interpretable cross-modal reasoning (I-CMR)
This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR.
arXiv Detail & Related papers (2023-09-05T05:06:48Z) - Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models [76.48370548802464]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance.
We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process.
Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.
arXiv Detail & Related papers (2021-08-26T04:23:57Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Interpretability and Explainability: A Machine Learning Zoo Mini-tour [4.56877715768796]
Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences.
We emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art.
arXiv Detail & Related papers (2020-12-03T10:11:52Z) - An Investigation of Language Model Interpretability via Sentence Editing [5.492504126672887]
We re-purpose a sentence editing dataset as a testbed for interpretability of pre-trained language models (PLMs)
This enables us to conduct a systematic investigation on an array of questions regarding PLMs' interpretability.
The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales.
arXiv Detail & Related papers (2020-11-28T00:46:43Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.