Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities:
Robustness, Safety, and Generalizability
- URL: http://arxiv.org/abs/2209.08025v1
- Date: Fri, 16 Sep 2022 16:10:08 GMT
- Title: Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities:
Robustness, Safety, and Generalizability
- Authors: Mengdi Xu, Zuxin Liu, Peide Huang, Wenhao Ding, Zhepeng Cen, Bo Li and
Ding Zhao
- Abstract summary: A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems.
This study aims to overview these main perspectives of trustworthy reinforcement learning.
- Score: 23.82257896376779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A trustworthy reinforcement learning algorithm should be competent in solving
challenging real-world problems, including {robustly} handling uncertainties,
satisfying {safety} constraints to avoid catastrophic failures, and
{generalizing} to unseen scenarios during deployments. This study aims to
overview these main perspectives of trustworthy reinforcement learning
considering its intrinsic vulnerabilities on robustness, safety, and
generalizability. In particular, we give rigorous formulations, categorize
corresponding methodologies, and discuss benchmarks for each perspective.
Moreover, we provide an outlook section to spur promising future directions
with a brief discussion on extrinsic vulnerabilities considering human
feedback. We hope this survey could bring together separate threads of studies
together in a unified framework and promote the trustworthiness of
reinforcement learning.
Related papers
- Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide [2.152298082788376]
Probable robustness (PR) offers a more practical perspective by quantifying the likelihood of failures under perturbations.
This paper provides a concise yet comprehensive overview of PR, covering its formal definitions, evaluation and enhancement methods.
We explore the integration of PR verification evidence into system-level safety assurance, addressing challenges in translating DL model-level robustness to system-level claims.
arXiv Detail & Related papers (2025-02-20T18:47:17Z) - Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment [63.15719512614899]
Refusal Training (RT) struggles to generalize against various OOD jailbreaking attacks.
We observe significant improvements on generalization as N increases.
We propose training model to perform safety reasoning for each query.
arXiv Detail & Related papers (2025-02-06T13:01:44Z) - OpenAI o1 System Card [274.83891368890977]
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.
This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
arXiv Detail & Related papers (2024-12-21T18:04:31Z) - Feasibility Consistent Representation Learning for Safe Reinforcement Learning [25.258227763316228]
We introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL)
This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL.
Our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
arXiv Detail & Related papers (2024-05-20T01:37:21Z) - A Survey of Constraint Formulations in Safe Reinforcement Learning [15.593999581562203]
Safety is critical when applying reinforcement learning to real-world problems.
A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward.
Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult.
arXiv Detail & Related papers (2024-02-03T04:40:31Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - Causal Reinforcement Learning: A Survey [57.368108154871]
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty.
One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world.
Causality offers a notable advantage as it can formalize knowledge in a systematic manner.
arXiv Detail & Related papers (2023-07-04T03:00:43Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z) - Disentangled Text Representation Learning with Information-Theoretic
Perspective for Adversarial Robustness [17.5771010094384]
Adversarial vulnerability remains a major obstacle to constructing reliable NLP systems.
Recent work argues the adversarial vulnerability of the model is caused by the non-robust features in supervised training.
In this paper, we tackle the adversarial challenge from the view of disentangled representation learning.
arXiv Detail & Related papers (2022-10-26T18:14:39Z) - On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making.
We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z) - Synthesizing Safe Policies under Probabilistic Constraints with
Reinforcement Learning and Bayesian Model Checking [4.797216015572358]
We introduce a framework for specification of requirements for reinforcement learners in constrained settings.
We show that an agent's confidence in constraint satisfaction provides a useful signal for balancing optimization and safety in the learning process.
arXiv Detail & Related papers (2020-05-08T08:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.