Can Go AIs be adversarially robust?
- URL: http://arxiv.org/abs/2406.12843v1
- Date: Tue, 18 Jun 2024 17:57:49 GMT
- Title: Can Go AIs be adversarially robust?
- Authors: Tom Tseng, Euan McLean, Kellin Pelrine, Tony T. Wang, Adam Gleave,
- Abstract summary: We study if simple defenses can improve KataGo's worst-case performance.
We find that none of these defenses are able to withstand adaptive attacks.
Our results suggest that building robust AI systems is challenging even in narrow domains such as Go.
- Score: 4.466856575755327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if simple defenses can improve KataGo's worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that some of these defenses are able to protect against previously discovered attacks. Unfortunately, we also find that none of these defenses are able to withstand adaptive attacks. In particular, we are able to train new adversaries that reliably defeat our defended agents by causing them to blunder in ways humans would not. Our results suggest that building robust AI systems is challenging even in narrow domains such as Go. For interactive examples of attacks and a link to our codebase, see https://goattack.far.ai.
Related papers
- A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion [0.0]
Our proposal suggests a different approach to the AI Guardian framework.
Instead of including adversarial examples in the training process, we propose training the AI system without them.
This aims to create a system that is inherently resilient to a wider range of attacks.
arXiv Detail & Related papers (2024-05-03T04:08:15Z) - The Best Defense is a Good Offense: Adversarial Augmentation against
Adversarial Attacks [91.56314751983133]
$A5$ is a framework to craft a defensive perturbation to guarantee that any attack towards the input in hand will fail.
We show effective on-the-fly defensive augmentation with a robustifier network that ignores the ground truth label.
We also show how to apply $A5$ to create certifiably robust physical objects.
arXiv Detail & Related papers (2023-05-23T16:07:58Z) - Adversarial Policies Beat Superhuman Go AIs [54.15639517188804]
We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it.
Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders.
Our results demonstrate that even superhuman AI systems may harbor surprising failure modes.
arXiv Detail & Related papers (2022-11-01T03:13:20Z) - Defending Against Stealthy Backdoor Attacks [1.6453255188693543]
Recent works have shown that it is not difficult to attack a natural language processing (NLP) model while defending against them is still a cat-mouse game.
In this work, we present a few defense strategies that can be useful to counter against such an attack.
arXiv Detail & Related papers (2022-05-27T21:38:42Z) - The Threat of Offensive AI to Organizations [52.011307264694665]
This survey explores the threat of offensive AI on organizations.
First, we discuss how AI changes the adversary's methods, strategies, goals, and overall attack model.
Then, through a literature review, we identify 33 offensive AI capabilities which adversaries can use to enhance their attacks.
arXiv Detail & Related papers (2021-06-30T01:03:28Z) - What Doesn't Kill You Makes You Robust(er): Adversarial Training against
Poisons and Backdoors [57.040948169155925]
We extend the adversarial training framework to defend against (training-time) poisoning and backdoor attacks.
Our method desensitizes networks to the effects of poisoning by creating poisons during training and injecting them into training batches.
We show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.
arXiv Detail & Related papers (2021-02-26T17:54:36Z) - Mitigating Advanced Adversarial Attacks with More Advanced Gradient
Obfuscation Techniques [13.972753012322126]
Deep Neural Networks (DNNs) are well-known to be vulnerable to Adversarial Examples (AEs)
Recently, advanced gradient-based attack techniques were proposed.
In this paper, we make a steady step towards mitigating those advanced gradient-based attacks.
arXiv Detail & Related papers (2020-05-27T23:42:25Z) - Certified Defenses for Adversarial Patches [72.65524549598126]
Adversarial patch attacks are among the most practical threat models against real-world computer vision systems.
This paper studies certified and empirical defenses against patch attacks.
arXiv Detail & Related papers (2020-03-14T19:57:31Z) - Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class.
We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.