A test suite of prompt injection attacks for LLM-based machine translation
- URL: http://arxiv.org/abs/2410.05047v1
- Date: Mon, 7 Oct 2024 14:01:20 GMT
- Title: A test suite of prompt injection attacks for LLM-based machine translation
- Authors: Antonio Valerio Miceli-Barone, Zhifan Sun,
- Abstract summary: LLM-based NLP systems typically work by embedding their input data into prompt templates which contain instructions and/or in-context examples.
Recently, Sun and Miceli-Barone proposed a class of PIAs against LLM-based machine translation.
We extend this approach to all the language pairs of the WMT 2024 General Machine Translation task.
- Score: 4.459306403129608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM-based NLP systems typically work by embedding their input data into prompt templates which contain instructions and/or in-context examples, creating queries which are submitted to a LLM, and then parsing the LLM response in order to generate the system outputs. Prompt Injection Attacks (PIAs) are a type of subversion of these systems where a malicious user crafts special inputs which interfere with the prompt templates, causing the LLM to respond in ways unintended by the system designer. Recently, Sun and Miceli-Barone proposed a class of PIAs against LLM-based machine translation. Specifically, the task is to translate questions from the TruthfulQA test suite, where an adversarial prompt is prepended to the questions, instructing the system to ignore the translation instruction and answer the questions instead. In this test suite, we extend this approach to all the language pairs of the WMT 2024 General Machine Translation task. Moreover, we include additional attack formats in addition to the one originally studied.
Related papers
- Aligning LLMs to Be Robust Against Prompt Injection [55.07562650579068]
We show that alignment can be a powerful tool to make LLMs more robust against prompt injection attacks.
Our method -- SecAlign -- first builds an alignment dataset by simulating prompt injection attacks.
Our experiments show that SecAlign robustifies the LLM substantially with a negligible hurt on model utility.
arXiv Detail & Related papers (2024-10-07T19:34:35Z) - Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
This research explores converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing.
We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM.
Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting [54.931241667414184]
We propose textbfAdaptive textbfShield Prompting, which prepends inputs with defense prompts to defend MLLMs against structure-based jailbreak attacks.
Our methods can consistently improve MLLMs' robustness against structure-based jailbreak attacks.
arXiv Detail & Related papers (2024-03-14T15:57:13Z) - Defending LLMs against Jailbreaking Attacks via Backtranslation [61.878363293735624]
We propose a new method for defending LLMs against jailbreaking attacks by backtranslation''
The inferred prompt is called the backtranslated prompt which tends to reveal the actual intent of the original prompt.
We empirically demonstrate that our defense significantly outperforms the baselines.
arXiv Detail & Related papers (2024-02-26T10:03:33Z) - StruQ: Defending Against Prompt Injection with Structured Queries [10.22774624798198]
Large Language Models (LLMs) can perform text-based tasks by utilizing their advanced language understanding capabilities.
Prompt injection attacks are an important threat, they trick the model into deviating from the original application's instructions and instead follow user directives.
We introduce structured queries, a general approach to tackle this problem.
Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility.
arXiv Detail & Related papers (2024-02-09T12:15:51Z) - ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation [10.503097140635374]
ChIRAAG, based on OpenAI GPT4, generates System Verilog Assertion (SVA) from natural language specifications of a design.
In experiments, only 27% of LLM-generated raw assertions had errors, which was rectified in few iterations.
Our results show that LLMs can streamline and assist engineers in the assertion generation process, reshaping verification.
arXiv Detail & Related papers (2024-01-31T12:41:27Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT [1.2640882896302839]
This paper provides contributions to research on prompt engineering that apply large language models (LLMs) to automate software development tasks.
It provides a framework for documenting patterns for structuring prompts to solve a range of problems so that they can be adapted to different domains.
Third, it explains how prompts can be built from multiple patterns and illustrates prompt patterns that benefit from combination with other prompt patterns.
arXiv Detail & Related papers (2023-02-21T12:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.