Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities:
Robustness, Safety, and Generalizability
- URL: http://arxiv.org/abs/2209.08025v1
- Date: Fri, 16 Sep 2022 16:10:08 GMT
- Title: Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities:
Robustness, Safety, and Generalizability
- Authors: Mengdi Xu, Zuxin Liu, Peide Huang, Wenhao Ding, Zhepeng Cen, Bo Li and
Ding Zhao
- Abstract summary: A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems.
This study aims to overview these main perspectives of trustworthy reinforcement learning.
- Score: 23.82257896376779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A trustworthy reinforcement learning algorithm should be competent in solving
challenging real-world problems, including {robustly} handling uncertainties,
satisfying {safety} constraints to avoid catastrophic failures, and
{generalizing} to unseen scenarios during deployments. This study aims to
overview these main perspectives of trustworthy reinforcement learning
considering its intrinsic vulnerabilities on robustness, safety, and
generalizability. In particular, we give rigorous formulations, categorize
corresponding methodologies, and discuss benchmarks for each perspective.
Moreover, we provide an outlook section to spur promising future directions
with a brief discussion on extrinsic vulnerabilities considering human
feedback. We hope this survey could bring together separate threads of studies
together in a unified framework and promote the trustworthiness of
reinforcement learning.
Related papers
- Feasibility Consistent Representation Learning for Safe Reinforcement Learning [25.258227763316228]
We introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL)
This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL.
Our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
arXiv Detail & Related papers (2024-05-20T01:37:21Z) - A Survey of Constraint Formulations in Safe Reinforcement Learning [15.593999581562203]
Safety is critical when applying reinforcement learning to real-world problems.
A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward.
Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult.
arXiv Detail & Related papers (2024-02-03T04:40:31Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - A Systematic Review on Fostering Appropriate Trust in Human-AI
Interaction [19.137907393497848]
Appropriate Trust in Artificial Intelligence (AI) systems has rapidly become an important area of focus for both researchers and practitioners.
Various approaches have been used to achieve it, such as confidence scores, explanations, trustworthiness cues, or uncertainty communication.
This paper presents a systematic review to identify current practices in building appropriate trust, different ways to measure it, types of tasks used, and potential challenges associated with it.
arXiv Detail & Related papers (2023-11-08T12:19:58Z) - Causal Reinforcement Learning: A Survey [57.368108154871]
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty.
One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world.
Causality offers a notable advantage as it can formalize knowledge in a systematic manner.
arXiv Detail & Related papers (2023-07-04T03:00:43Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z) - Disentangled Text Representation Learning with Information-Theoretic
Perspective for Adversarial Robustness [17.5771010094384]
Adversarial vulnerability remains a major obstacle to constructing reliable NLP systems.
Recent work argues the adversarial vulnerability of the model is caused by the non-robust features in supervised training.
In this paper, we tackle the adversarial challenge from the view of disentangled representation learning.
arXiv Detail & Related papers (2022-10-26T18:14:39Z) - On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making.
We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z) - Where Did You Learn That From? Surprising Effectiveness of Membership
Inference Attacks Against Temporally Correlated Data in Deep Reinforcement
Learning [114.9857000195174]
A major challenge to widespread industrial adoption of deep reinforcement learning is the potential vulnerability to privacy breaches.
We propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks.
arXiv Detail & Related papers (2021-09-08T23:44:57Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - Synthesizing Safe Policies under Probabilistic Constraints with
Reinforcement Learning and Bayesian Model Checking [4.797216015572358]
We introduce a framework for specification of requirements for reinforcement learners in constrained settings.
We show that an agent's confidence in constraint satisfaction provides a useful signal for balancing optimization and safety in the learning process.
arXiv Detail & Related papers (2020-05-08T08:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.