Related papers: Reducing Exploitability with Population Based Training

Reducing Exploitability with Population Based Training

URL: http://arxiv.org/abs/2208.05083v1
Date: Wed, 10 Aug 2022 00:04:46 GMT
Title: Reducing Exploitability with Population Based Training
Authors: Pavel Czempin and Adam Gleave
Abstract summary: Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games. Prior work has found that policies that are highly capable against regular opponents can fail catastrophically against adversarial policies. We propose a defense using population based training to pit the victim against a diverse set of opponents.
Score: 2.538209532048867
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games. Yet prior work has found that policies that are highly capable against regular opponents can fail catastrophically against adversarial policies: an opponent trained explicitly against the victim. Prior defenses using adversarial training were able to make the victim robust to a specific adversary, but the victim remained vulnerable to new ones. We conjecture this limitation was due to insufficient diversity of adversaries seen during training. We propose a defense using population based training to pit the victim against a diverse set of opponents. We evaluate this defense's robustness against new adversaries in two low-dimensional environments. Our defense increases robustness against adversaries, as measured by number of attacker training timesteps to exploit the victim. Furthermore, we show that robustness is correlated with the size of the opponent population.

Related papers

Gradient-Free Adversarial Purification with Diffusion Models [10.917491144598575]
Adversarial training and adversarial purification are effective methods to enhance a model's robustness against adversarial attacks. We propose an effective and efficient adversarial defense method that counters both perturbation-based and unrestricted adversarial attacks.
arXiv Detail & Related papers (2025-01-23T02:34:14Z)
Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness [11.722685584919757]
Adversarial training aims to defend against adversaries whose sole aim is to harm predictive performance in any way possible.<n>We propose to model opponents as simply pursuing their own goals--rather than working directly against the classifier.<n>We conduct a series of experiments that show how even mild knowledge regarding the opponent's incentives can be useful.
arXiv Detail & Related papers (2024-06-17T12:20:59Z)
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL [46.32591437241358]
In this paper, we consider a multi-agent setting where a well-trained victim agent is exploited by an attacker controlling another agent. Previous models do not account for the possibility that the attacker may only have partial control over $alpha$ or that the attack may produce easily detectable "abnormal" behaviors. We introduce a generalized attack framework that has the flexibility to model what extent the adversary is able to control the agent. We offer a provably efficient defense with convergence to the most robust victim policy through adversarial training with timescale separation.
arXiv Detail & Related papers (2023-05-27T02:54:07Z)
Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing [5.1024659285813785]
Adversarial training has been the most successful defense against such adversarial attacks. We propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training. Our method achieves state-of-the-art robustness against $ell_infty$-norm constrained attacks.
arXiv Detail & Related papers (2023-03-24T15:41:40Z)
Adversarial Machine Learning and Defense Game for NextG Signal Classification with Deep Learning [1.1726528038065764]
NextG systems can employ deep neural networks (DNNs) for various tasks such as user equipment identification, physical layer authentication, and detection of incumbent users. This paper presents a game-theoretic framework to study the interactions of attack and defense for deep learning-based NextG signal classification.
arXiv Detail & Related papers (2022-12-22T15:13:03Z)
Universal Adversarial Training with Class-Wise Perturbations [78.05383266222285]
adversarial training is the most widely used method for defending against adversarial attacks. In this work, we find that a UAP does not attack all classes equally. We improve the SOTA UAT by proposing to utilize class-wise UAPs during adversarial training.
arXiv Detail & Related papers (2021-04-07T09:05:49Z)
What Doesn't Kill You Makes You Robust(er): Adversarial Training against Poisons and Backdoors [57.040948169155925]
We extend the adversarial training framework to defend against (training-time) poisoning and backdoor attacks. Our method desensitizes networks to the effects of poisoning by creating poisons during training and injecting them into training batches. We show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.
arXiv Detail & Related papers (2021-02-26T17:54:36Z)
Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses [59.58128343334556]
We introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training. We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries. We also propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses.
arXiv Detail & Related papers (2020-11-30T16:39:39Z)
Harnessing adversarial examples with a surprisingly simple defense [47.64219291655723]
I introduce a very simple method to defend against adversarial examples. The basic idea is to raise the slope of the ReLU function at the test time. Experiments over MNIST and CIFAR-10 datasets demonstrate the effectiveness of the proposed defense.
arXiv Detail & Related papers (2020-04-26T03:09:42Z)
Attacks Which Do Not Kill Training Make Adversarial Learning Stronger [85.96849265039619]
Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. We argue that adversarial training is to employ confident adversarial data for updating the current model.
arXiv Detail & Related papers (2020-02-26T01:04:38Z)
Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class. We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.