L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks
- URL: http://arxiv.org/abs/2401.15335v2
- Date: Wed, 22 May 2024 11:40:21 GMT
- Title: L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks
- Authors: Ping Guo, Fei Liu, Xi Lin, Qingchuan Zhao, Qingfu Zhang,
- Abstract summary: This work introduces L-AutoDA, a novel approach leveraging the generative capabilities of Large Language Models (LLMs) to automate the design of adversarial attacks.
By iteratively interacting with LLMs in an evolutionary framework, L-AutoDA automatically designs competitive attack algorithms efficiently without much human effort.
We demonstrate the efficacy of L-AutoDA on CIFAR-10 dataset, showing significant improvements over baseline methods in both success rate and computational efficiency.
- Score: 16.457528502745415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly evolving field of machine learning, adversarial attacks present a significant challenge to model robustness and security. Decision-based attacks, which only require feedback on the decision of a model rather than detailed probabilities or scores, are particularly insidious and difficult to defend against. This work introduces L-AutoDA (Large Language Model-based Automated Decision-based Adversarial Attacks), a novel approach leveraging the generative capabilities of Large Language Models (LLMs) to automate the design of these attacks. By iteratively interacting with LLMs in an evolutionary framework, L-AutoDA automatically designs competitive attack algorithms efficiently without much human effort. We demonstrate the efficacy of L-AutoDA on CIFAR-10 dataset, showing significant improvements over baseline methods in both success rate and computational efficiency. Our findings underscore the potential of language models as tools for adversarial attack generation and highlight new avenues for the development of robust AI systems.
Related papers
- Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation.
Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z) - Data-Free Model-Related Attacks: Unleashing the Potential of Generative AI [21.815149263785912]
We introduce the use of generative AI for facilitating model-related attacks, including model extraction, membership inference, and model inversion.
Our study reveals that adversaries can launch a variety of model-related attacks against both image and text models in a data-free and black-box manner.
This research serves as an important early warning to the community about the potential risks associated with generative AI-powered attacks on deep learning models.
arXiv Detail & Related papers (2025-01-28T03:12:57Z) - Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models [62.12822290276912]
Auto-RT is a reinforcement learning framework that automatically explores and optimize complex attack strategies.
By significantly improving exploration efficiency and automatically optimizing attack strategies, Auto-RT detects a boarder range of vulnerabilities, achieving a faster detection speed and 16.63% higher success rates compared to existing methods.
arXiv Detail & Related papers (2025-01-03T14:30:14Z) - Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics [70.93622520400385]
This paper systematically quantifies the robustness of VLA-based robotic systems.
We introduce an untargeted position-aware attack objective that leverages spatial foundations to destabilize robotic actions.
We also design an adversarial patch generation approach that places a small, colorful patch within the camera's view, effectively executing the attack in both digital and physical environments.
arXiv Detail & Related papers (2024-11-18T01:52:20Z) - Towards Autonomous Cybersecurity: An Intelligent AutoML Framework for Autonomous Intrusion Detection [21.003217781832923]
This paper proposes an Automated Machine Learning (AutoML)-based autonomous IDS framework towards achieving autonomous cybersecurity for next-generation networks.
The proposed AutoML-based IDS was evaluated on two public benchmark network security datasets, CICIDS 2017 and 5G-NIDD.
This research marks a significant step towards fully autonomous cybersecurity in next-generation networks, potentially revolutionizing network security applications.
arXiv Detail & Related papers (2024-09-05T00:36:23Z) - From Sands to Mansions: Simulating Full Attack Chain with LLM-Organized Knowledge [10.065241604400223]
Multi-stage attack simulations offer a promising approach to enhance system evaluation efficiency.
simulating a full attack chain is complex and requires significant time and expertise from security professionals.
We introduce Aurora, a system that autonomously simulates full attack chains based on external attack tools and threat intelligence reports.
arXiv Detail & Related papers (2024-07-24T01:33:57Z) - Defending Large Language Models Against Attacks With Residual Stream Activation Analysis [0.0]
Large Language Models (LLMs) are vulnerable to adversarial threats.
This paper presents an innovative defensive strategy, given white box access to an LLM.
We apply a novel methodology for analyzing distinctive activation patterns in the residual streams for attack prompt classification.
arXiv Detail & Related papers (2024-06-05T13:06:33Z) - Learning diverse attacks on large language models for robust red-teaming and safety tuning [126.32539952157083]
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe deployment of large language models.
We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks.
We propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts.
arXiv Detail & Related papers (2024-05-28T19:16:17Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.