L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks
- URL: http://arxiv.org/abs/2401.15335v2
- Date: Wed, 22 May 2024 11:40:21 GMT
- Title: L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks
- Authors: Ping Guo, Fei Liu, Xi Lin, Qingchuan Zhao, Qingfu Zhang,
- Abstract summary: This work introduces L-AutoDA, a novel approach leveraging the generative capabilities of Large Language Models (LLMs) to automate the design of adversarial attacks.
By iteratively interacting with LLMs in an evolutionary framework, L-AutoDA automatically designs competitive attack algorithms efficiently without much human effort.
We demonstrate the efficacy of L-AutoDA on CIFAR-10 dataset, showing significant improvements over baseline methods in both success rate and computational efficiency.
- Score: 16.457528502745415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly evolving field of machine learning, adversarial attacks present a significant challenge to model robustness and security. Decision-based attacks, which only require feedback on the decision of a model rather than detailed probabilities or scores, are particularly insidious and difficult to defend against. This work introduces L-AutoDA (Large Language Model-based Automated Decision-based Adversarial Attacks), a novel approach leveraging the generative capabilities of Large Language Models (LLMs) to automate the design of these attacks. By iteratively interacting with LLMs in an evolutionary framework, L-AutoDA automatically designs competitive attack algorithms efficiently without much human effort. We demonstrate the efficacy of L-AutoDA on CIFAR-10 dataset, showing significant improvements over baseline methods in both success rate and computational efficiency. Our findings underscore the potential of language models as tools for adversarial attack generation and highlight new avenues for the development of robust AI systems.
Related papers
- Towards Autonomous Cybersecurity: An Intelligent AutoML Framework for Autonomous Intrusion Detection [21.003217781832923]
This paper proposes an Automated Machine Learning (AutoML)-based autonomous IDS framework towards achieving autonomous cybersecurity for next-generation networks.
The proposed AutoML-based IDS was evaluated on two public benchmark network security datasets, CICIDS 2017 and 5G-NIDD.
This research marks a significant step towards fully autonomous cybersecurity in next-generation networks, potentially revolutionizing network security applications.
arXiv Detail & Related papers (2024-09-05T00:36:23Z) - SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models [19.486685336959482]
Large language models (LLMs) continue to advance in capability and influence, ensuring their security and preventing harmful outputs has become crucial.
A promising approach to address these concerns involves training models to automatically generate adversarial prompts for red teaming.
We introduce the $mathbfStextelf-mathbfEtextvolving mathbfAtextdversarial mathbfStextafetyety mathbf(SEAS)$ optimization framework.
SEAS operates through three iterative stages: Initialization, Attack, and Adversa
arXiv Detail & Related papers (2024-08-05T16:55:06Z) - Position: A Call to Action for a Human-Centered AutoML Paradigm [83.78883610871867]
Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML)
We argue that a key to unlocking AutoML's full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems.
arXiv Detail & Related papers (2024-06-05T15:05:24Z) - Defending Large Language Models Against Attacks With Residual Stream Activation Analysis [0.0]
Large Language Models (LLMs) are vulnerable to adversarial threats.
This paper presents an innovative defensive strategy, given white box access to an LLM.
We apply a novel methodology for analyzing distinctive activation patterns in the residual streams for attack prompt classification.
arXiv Detail & Related papers (2024-06-05T13:06:33Z) - Learning diverse attacks on large language models for robust red-teaming and safety tuning [126.32539952157083]
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe deployment of large language models.
We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks.
We propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts.
arXiv Detail & Related papers (2024-05-28T19:16:17Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - InferAligner: Inference-Time Alignment for Harmlessness through
Cross-Model Guidance [56.184255657175335]
We develop textbfInferAligner, a novel inference-time alignment method that utilizes cross-model guidance for harmlessness alignment.
Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics.
It significantly diminishes the Attack Success Rate (ASR) of both harmful instructions and jailbreak attacks, while maintaining almost unchanged performance in downstream tasks.
arXiv Detail & Related papers (2024-01-20T10:41:03Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge
Collaborative AutoML System [85.8338446357469]
We introduce OmniForce, a human-centered AutoML system that yields both human-assisted ML and ML-assisted human techniques.
We show how OmniForce can put an AutoML system into practice and build adaptive AI in open-environment scenarios.
arXiv Detail & Related papers (2023-03-01T13:35:22Z) - A Generative Model based Adversarial Security of Deep Learning and
Linear Classifier Models [0.0]
We have proposed a mitigation method for adversarial attacks against machine learning models with an autoencoder model.
The main idea behind adversarial attacks against machine learning models is to produce erroneous results by manipulating trained models.
We have also presented the performance of autoencoder models to various attack methods from deep neural networks to traditional algorithms.
arXiv Detail & Related papers (2020-10-17T17:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.