Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks
- URL: http://arxiv.org/abs/2410.20911v2
- Date: Mon, 18 Nov 2024 09:15:46 GMT
- Title: Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks
- Authors: Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese,
- Abstract summary: Large language models (LLMs) are increasingly being harnessed to automate cyberattacks.
Mantis is a framework that exploits LLMs' susceptibility to adversarial inputs to undermine malicious operations.
Mantis consistently achieved over 95% effectiveness against automated LLM-driven attacks.
- Score: 15.726286532500971
- License:
- Abstract: Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a defensive framework that exploits LLMs' susceptibility to adversarial inputs to undermine malicious operations. Upon detecting an automated cyberattack, Mantis plants carefully crafted inputs into system responses, leading the attacker's LLM to disrupt their own operations (passive defense) or even compromise the attacker's machine (active defense). By deploying purposefully vulnerable decoy services to attract the attacker and using dynamic prompt injections for the attacker's LLM, Mantis can autonomously hack back the attacker. In our experiments, Mantis consistently achieved over 95% effectiveness against automated LLM-driven attacks. To foster further research and collaboration, Mantis is available as an open-source tool: https://github.com/pasquini-dario/project_mantis
Related papers
- Defense Against Prompt Injection Attack by Leveraging Attack Techniques [66.65466992544728]
Large language models (LLMs) have achieved remarkable performance across various natural language processing (NLP) tasks.
As LLMs continue to evolve, new vulnerabilities, especially prompt injection attacks arise.
Recent attack methods leverage LLMs' instruction-following abilities and their inabilities to distinguish instructions injected in the data content.
arXiv Detail & Related papers (2024-11-01T09:14:21Z) - The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks [2.6528263069045126]
Large language models (LLMs) could soon become integral to autonomous cyber agents.
We introduce novel defense strategies that exploit the inherent vulnerabilities of attacking LLMs.
Our results show defense success rates of up to 90%, demonstrating the effectiveness of turning LLM vulnerabilities into defensive strategies.
arXiv Detail & Related papers (2024-10-20T14:07:24Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game [28.33029508522531]
Malicious attackers induce large models to jailbreak and generate information containing illegal, privacy-invasive information.
Large models counter malicious attackers' attacks using techniques such as safety alignment.
We propose a multi-agent attacker-disguiser game approach to achieve a weak defense mechanism that allows the large model to both safely reply to the attacker and hide the defense intent.
arXiv Detail & Related papers (2024-04-03T07:43:11Z) - AutoAttacker: A Large Language Model Guided System to Implement
Automatic Cyber-attacks [13.955084410934694]
Large language models (LLMs) have demonstrated impressive results on natural language tasks.
As LLMs inevitably advance, they may be able to automate both the pre- and post-breach attack stages.
This research can help defensive systems and teams learn to detect novel attack behaviors preemptively before their use in the wild.
arXiv Detail & Related papers (2024-03-02T00:10:45Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z) - Arms Race in Adversarial Malware Detection: A Survey [33.8941961394801]
Malicious software (malware) is a major cyber threat that has to be tackled with Machine Learning (ML) techniques.
ML is vulnerable to attacks known as adversarial examples.
Knowing the defender's feature set is critical to the success of transfer attacks.
The effectiveness of adversarial training depends on the defender's capability in identifying the most powerful attack.
arXiv Detail & Related papers (2020-05-24T07:20:42Z) - On Certifying Robustness against Backdoor Attacks via Randomized
Smoothing [74.79764677396773]
We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing.
Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks.
Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
arXiv Detail & Related papers (2020-02-26T19:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.