COPA: Certifying Robust Policies for Offline Reinforcement Learning
against Poisoning Attacks
- URL: http://arxiv.org/abs/2203.08398v1
- Date: Wed, 16 Mar 2022 05:02:47 GMT
- Title: COPA: Certifying Robust Policies for Offline Reinforcement Learning
against Poisoning Attacks
- Authors: Fan Wu, Linyi Li, Chejian Xu, Huan Zhang, Bhavya Kailkhura, Krishnaram
Kenthapadi, Ding Zhao, Bo Li
- Abstract summary: We focus on certifying the robustness of offline reinforcement learning (RL) in the presence of poisoning attacks.
We propose the first certification framework, COPA, to certify the number of poisoning trajectories that can be tolerated.
We prove that some of the proposed certification methods are theoretically tight and some are NP-Complete problems.
- Score: 49.15885037760725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As reinforcement learning (RL) has achieved near human-level performance in a
variety of tasks, its robustness has raised great attention. While a vast body
of research has explored test-time (evasion) attacks in RL and corresponding
defenses, its robustness against training-time (poisoning) attacks remains
largely unanswered. In this work, we focus on certifying the robustness of
offline RL in the presence of poisoning attacks, where a subset of training
trajectories could be arbitrarily manipulated. We propose the first
certification framework, COPA, to certify the number of poisoning trajectories
that can be tolerated regarding different certification criteria. Given the
complex structure of RL, we propose two certification criteria: per-state
action stability and cumulative reward bound. To further improve the
certification, we propose new partition and aggregation protocols to train
robust policies. We further prove that some of the proposed certification
methods are theoretically tight and some are NP-Complete problems. We leverage
COPA to certify three RL environments trained with different algorithms and
conclude: (1) The proposed robust aggregation protocols such as temporal
aggregation can significantly improve the certifications; (2) Our certification
for both per-state action stability and cumulative reward bound are efficient
and tight; (3) The certification for different training algorithms and
environments are different, implying their intrinsic robustness properties. All
experimental results are available at https://copa-leaderboard.github.io.
Related papers
- Certifying Safety in Reinforcement Learning under Adversarial
Perturbation Attacks [23.907977144668838]
We propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time.
We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL.
arXiv Detail & Related papers (2022-12-28T22:33:38Z) - Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement
Learning [17.957644784944755]
We propose a novel certification method for c-MARLs to determine actions with guaranteed certified bounds.
We empirically show that our certification bounds are much tighter than state-of-the-art RL certification solutions.
Our method produces meaningful guaranteed robustness for all models and environments.
arXiv Detail & Related papers (2022-12-22T14:36:27Z) - Improved Certified Defenses against Data Poisoning with (Deterministic)
Finite Aggregation [122.83280749890078]
We propose an improved certified defense against general poisoning attacks, namely Finite Aggregation.
In contrast to DPA, which directly splits the training set into disjoint subsets, our method first splits the training set into smaller disjoint subsets.
We offer an alternative view of our method, bridging the designs of deterministic and aggregation-based certified defenses.
arXiv Detail & Related papers (2022-02-05T20:08:58Z) - URLB: Unsupervised Reinforcement Learning Benchmark [82.36060735454647]
We introduce the Unsupervised Reinforcement Learning Benchmark (URLB)
URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards.
We provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods.
arXiv Detail & Related papers (2021-10-28T15:07:01Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - CROP: Certifying Robust Policies for Reinforcement Learning through
Functional Smoothing [41.093241772796475]
We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations.
We propose two types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards.
arXiv Detail & Related papers (2021-06-17T07:58:32Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.