Related papers: Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

URL: http://arxiv.org/abs/2209.08025v1
Date: Fri, 16 Sep 2022 16:10:08 GMT
Title: Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability
Authors: Mengdi Xu, Zuxin Liu, Peide Huang, Wenhao Ding, Zhepeng Cen, Bo Li and Ding Zhao
Abstract summary: A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems. This study aims to overview these main perspectives of trustworthy reinforcement learning.
Score: 23.82257896376779
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including {robustly} handling uncertainties, satisfying {safety} constraints to avoid catastrophic failures, and {generalizing} to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning.

Related papers

REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models [59.445672459851274]
REVAL is a comprehensive benchmark designed to evaluate the textbfREliability and textbfVALue of Large Vision-Language Models. REVAL encompasses over 144K image-text Visual Question Answering (VQA) samples, structured into two primary sections: Reliability and Values. We evaluate 26 models, including mainstream open-source LVLMs and prominent closed-source models like GPT-4o and Gemini-1.5-Pro.
arXiv Detail & Related papers (2025-03-20T07:54:35Z)
Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide [2.152298082788376]
Probable robustness (PR) offers a more practical perspective by quantifying the likelihood of failures under perturbations. This paper provides a concise yet comprehensive overview of PR, covering its formal definitions, evaluation and enhancement methods. We explore the integration of PR verification evidence into system-level safety assurance, addressing challenges in translating DL model-level robustness to system-level claims.
arXiv Detail & Related papers (2025-02-20T18:47:17Z)
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations [127.52707312573791]
This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods. We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilities of LVLMs. We conduct a set of safety evaluations on the latest LVLM, Deepseek Janus-Pro, and provide a theoretical analysis of the results.
arXiv Detail & Related papers (2025-02-14T08:42:43Z)
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment [63.15719512614899]
Refusal Training (RT) struggles to generalize against various OOD jailbreaking attacks. We observe significant improvements on generalization as N increases. We propose training model to perform safety reasoning for each query.
arXiv Detail & Related papers (2025-02-06T13:01:44Z)
Feasibility Consistent Representation Learning for Safe Reinforcement Learning [25.258227763316228]
We introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL) This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
arXiv Detail & Related papers (2024-05-20T01:37:21Z)
A Survey of Constraint Formulations in Safe Reinforcement Learning [15.593999581562203]
Safety is critical when applying reinforcement learning to real-world problems. A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward. Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult.
arXiv Detail & Related papers (2024-02-03T04:40:31Z)
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z)
A Systematic Review on Fostering Appropriate Trust in Human-AI Interaction [19.137907393497848]
Appropriate Trust in Artificial Intelligence (AI) systems has rapidly become an important area of focus for both researchers and practitioners. Various approaches have been used to achieve it, such as confidence scores, explanations, trustworthiness cues, or uncertainty communication. This paper presents a systematic review to identify current practices in building appropriate trust, different ways to measure it, types of tasks used, and potential challenges associated with it.
arXiv Detail & Related papers (2023-11-08T12:19:58Z)
Causal Reinforcement Learning: A Survey [57.368108154871]
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world. Causality offers a notable advantage as it can formalize knowledge in a systematic manner.
arXiv Detail & Related papers (2023-07-04T03:00:43Z)
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models. We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models. We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z)
Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness [17.5771010094384]
Adversarial vulnerability remains a major obstacle to constructing reliable NLP systems. Recent work argues the adversarial vulnerability of the model is caused by the non-robust features in supervised training. In this paper, we tackle the adversarial challenge from the view of disentangled representation learning.
arXiv Detail & Related papers (2022-10-26T18:14:39Z)
On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making. We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z)
Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning [114.9857000195174]
A major challenge to widespread industrial adoption of deep reinforcement learning is the potential vulnerability to privacy breaches. We propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks.
arXiv Detail & Related papers (2021-09-08T23:44:57Z)
Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space. This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z)
Synthesizing Safe Policies under Probabilistic Constraints with Reinforcement Learning and Bayesian Model Checking [4.797216015572358]
We introduce a framework for specification of requirements for reinforcement learners in constrained settings. We show that an agent's confidence in constraint satisfaction provides a useful signal for balancing optimization and safety in the learning process.
arXiv Detail & Related papers (2020-05-08T08:11:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.