Related papers: A Survey on Explainable Deep Reinforcement Learning

A Survey on Explainable Deep Reinforcement Learning

URL: http://arxiv.org/abs/2502.06869v1
Date: Sat, 08 Feb 2025 05:30:31 GMT
Title: A Survey on Explainable Deep Reinforcement Learning
Authors: Zelei Cheng, Jiahao Yu, Xinyu Xing,
Abstract summary: Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making tasks across diverse domains.<n>Its reliance on black-box neural architectures hinders interpretability, trust, and deployment in high-stakes applications.<n>Explainable Deep Reinforcement Learning (XRL) addresses these challenges by enhancing transparency through feature-level, state-level, dataset-level, and model-level explanation techniques.
Score: 18.869827229746697
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making tasks across diverse domains, yet its reliance on black-box neural architectures hinders interpretability, trust, and deployment in high-stakes applications. Explainable Deep Reinforcement Learning (XRL) addresses these challenges by enhancing transparency through feature-level, state-level, dataset-level, and model-level explanation techniques. This survey provides a comprehensive review of XRL methods, evaluates their qualitative and quantitative assessment frameworks, and explores their role in policy refinement, adversarial robustness, and security. Additionally, we examine the integration of reinforcement learning with Large Language Models (LLMs), particularly through Reinforcement Learning from Human Feedback (RLHF), which optimizes AI alignment with human preferences. We conclude by highlighting open research challenges and future directions to advance the development of interpretable, reliable, and accountable DRL systems.

Related papers

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs) We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education. We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide [2.152298082788376]
Probable robustness (PR) offers a more practical perspective by quantifying the likelihood of failures under perturbations. This paper provides a concise yet comprehensive overview of PR, covering its formal definitions, evaluation and enhancement methods. We explore the integration of PR verification evidence into system-level safety assurance, addressing challenges in translating DL model-level robustness to system-level claims.
arXiv Detail & Related papers (2025-02-20T18:47:17Z)
A Comprehensive Survey of Reinforcement Learning: From Algorithms to Practical Challenges [2.2448567386846916]
Reinforcement Learning (RL) has emerged as a powerful paradigm in Artificial Intelligence (AI)<n>This paper presents a comprehensive survey of RL, meticulously analyzing a wide range of algorithms.<n>We offer practical insights into the selection and implementation of RL algorithms, addressing common challenges like convergence, stability, and the exploration-exploitation dilemma.
arXiv Detail & Related papers (2024-11-28T03:53:14Z)
A Comprehensive Survey on Evidential Deep Learning and Its Applications [64.83473301188138]
Evidential Deep Learning (EDL) provides reliable uncertainty estimation with minimal additional computation in a single forward pass. We first delve into the theoretical foundation of EDL, the subjective logic theory, and discuss its distinctions from other uncertainty estimation frameworks. We elaborate on its extensive applications across various machine learning paradigms and downstream tasks.
arXiv Detail & Related papers (2024-09-07T05:55:06Z)
Large Language Model Enhanced Knowledge Representation Learning: A Survey [15.602891714371342]
Knowledge Representation Learning (KRL) is crucial for enabling applications of symbolic knowledge from Knowledge Graphs to downstream tasks.<n>This work provides a broad overview of downstream tasks while simultaneously identifying emerging research directions in these evolving domains.
arXiv Detail & Related papers (2024-07-01T03:37:35Z)
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation. Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge. RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z)
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs [49.386699863989335]
Training large language models (LLMs) to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences. In this paper, we analyze RLHF through the lens of reinforcement learning principles to develop an understanding of its fundamentals.
arXiv Detail & Related papers (2024-04-12T15:54:15Z)
Structure in Deep Reinforcement Learning: A Survey and Open Problems [22.77618616444693]
Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural Networks (DNNs) for function approximation, has demonstrated considerable success in numerous applications. However, its practicality in addressing various real-world scenarios, characterized by diverse and unpredictable dynamics, remains limited. This limitation stems from poor data efficiency, limited generalization capabilities, a lack of safety guarantees, and the absence of interpretability.
arXiv Detail & Related papers (2023-06-28T08:48:40Z)
Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world. It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z)
A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges [38.70863329476517]
Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal. Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential. To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability.
arXiv Detail & Related papers (2022-11-12T13:52:06Z)
Deep Reinforcement Learning Versus Evolution Strategies: A Comparative Survey [2.554326189662943]
Deep Reinforcement Learning (DRL) and Evolution Strategies (ESs) have surpassed human-level control in many sequential decision-making problems. To get insights into the strengths and weaknesses of DRL versus ESs, an analysis of their respective capabilities and limitations is provided.
arXiv Detail & Related papers (2021-09-28T18:45:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.