Related papers: Can Go AIs be adversarially robust?

Can Go AIs be adversarially robust?

URL: http://arxiv.org/abs/2406.12843v2
Date: Tue, 24 Sep 2024 08:38:38 GMT
Title: Can Go AIs be adversarially robust?
Authors: Tom Tseng, Euan McLean, Kellin Pelrine, Tony T. Wang, Adam Gleave,
Abstract summary: We study whether adding natural countermeasures can achieve robustness in Go. We find that though some of these defenses protect against previously discovered attacks, none withstand freshly trained adversaries. Our results suggest that building robust AI systems is challenging even with extremely superhuman systems in some of the most tractable settings.
Score: 4.466856575755327
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prior work found that superhuman Go AIs can be defeated by simple adversarial strategies, especially "cyclic" attacks. In this paper, we study whether adding natural countermeasures can achieve robustness in Go, a favorable domain for robustness since it benefits from incredible average-case capability and a narrow, innately adversarial setting. We test three defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that though some of these defenses protect against previously discovered attacks, none withstand freshly trained adversaries. Furthermore, most of the reliably effective attacks these adversaries discover are different realizations of the same overall class of cyclic attacks. Our results suggest that building robust AI systems is challenging even with extremely superhuman systems in some of the most tractable settings, and highlight two key gaps: efficient generalization in defenses, and diversity in training. For interactive examples of attacks and a link to our codebase, see https://goattack.far.ai.

Related papers

A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion [0.0]
Our proposal suggests a different approach to the AI Guardian framework. Instead of including adversarial examples in the training process, we propose training the AI system without them. This aims to create a system that is inherently resilient to a wider range of attacks.
arXiv Detail & Related papers (2024-05-03T04:08:15Z)
Improving behavior based authentication against adversarial attack using XAI [3.340314613771868]
We propose an eXplainable AI (XAI) based defense strategy against adversarial attacks in such scenarios. A feature selector, trained with our method, can be used as a filter in front of the original authenticator. We demonstrate that our XAI based defense strategy is effective against adversarial attacks and outperforms other defense strategies.
arXiv Detail & Related papers (2024-02-26T09:29:05Z)
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples [25.029854308139853]
adversarial examples on 3D point clouds make them more challenging to defend against than those on 2D images. In this paper, we first establish a comprehensive, and rigorous point cloud adversarial robustness benchmark. We then perform extensive and systematic experiments to identify an effective combination of these tricks. We construct a more robust defense framework achieving an average accuracy of 83.45% against various attacks.
arXiv Detail & Related papers (2023-07-31T01:34:24Z)
The Best Defense is a Good Offense: Adversarial Augmentation against Adversarial Attacks [91.56314751983133]
$A5$ is a framework to craft a defensive perturbation to guarantee that any attack towards the input in hand will fail. We show effective on-the-fly defensive augmentation with a robustifier network that ignores the ground truth label. We also show how to apply $A5$ to create certifiably robust physical objects.
arXiv Detail & Related papers (2023-05-23T16:07:58Z)
On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern. In general, adversarial training is believed to defend against backdoor attacks. We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z)
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers. We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions. We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
Adversarial Attack and Defense in Deep Ranking [100.17641539999055]
We propose two attacks against deep ranking systems that can raise or lower the rank of chosen candidates by adversarial perturbations. Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks. Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196 and Stanford Online Products datasets.
arXiv Detail & Related papers (2021-06-07T13:41:45Z)
Universal Adversarial Training with Class-Wise Perturbations [78.05383266222285]
adversarial training is the most widely used method for defending against adversarial attacks. In this work, we find that a UAP does not attack all classes equally. We improve the SOTA UAT by proposing to utilize class-wise UAPs during adversarial training.
arXiv Detail & Related papers (2021-04-07T09:05:49Z)
TROJANZOO: Everything you ever wanted to know about neural backdoors (but were afraid to ask) [28.785693760449604]
TROJANZOO is the first open-source platform for evaluating neural backdoor attacks/defenses. It has 12 representative attacks, 15 state-of-the-art defenses, 6 attack performance metrics, 10 defense utility metrics, as well as rich tools for analysis of attack-defense interactions. We conduct a systematic study of existing attacks/defenses, leading to a number of interesting findings.
arXiv Detail & Related papers (2020-12-16T22:37:27Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
Harnessing adversarial examples with a surprisingly simple defense [47.64219291655723]
I introduce a very simple method to defend against adversarial examples. The basic idea is to raise the slope of the ReLU function at the test time. Experiments over MNIST and CIFAR-10 datasets demonstrate the effectiveness of the proposed defense.
arXiv Detail & Related papers (2020-04-26T03:09:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.