PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
- URL: http://arxiv.org/abs/2306.04528v5
- Date: Tue, 16 Jul 2024 07:29:49 GMT
- Title: PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
- Authors: Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Zhenqiang Gong, Xing Xie,
- Abstract summary: This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic.
The adversarial prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving.
Our findings demonstrate that contemporary Large Language Models are not robust to adversarial prompts.
- Score: 76.18347405302728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. The adversarial prompts, crafted to mimic plausible user errors like typos or synonyms, aim to evaluate how slight deviations can affect LLM outcomes while maintaining semantic integrity. These prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts. Furthermore, we present a comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users.
Related papers
- Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL [64.3268313484078]
Large Language Models (LLMs) interact with millions of people worldwide in applications such as customer support, education and healthcare.<n>Their ability to produce deceptive outputs, whether intentionally or inadvertently, poses significant safety concerns.<n>We investigate the extent to which LLMs engage in deception within dialogue, and propose the belief misalignment metric to quantify deception.
arXiv Detail & Related papers (2025-10-16T05:29:36Z) - Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs [34.51801559719707]
High prompt sensitivity has been widely accepted as a core limitation of large language models.<n>This work asks: Is the widely reported high prompt sensitivity truly an inherent weakness of LLMs, or is it largely an artifact of evaluation processes?<n>We find that much of the prompt sensitivity stems from evaluation methods, including log-likelihood scoring and rigid answer matching.
arXiv Detail & Related papers (2025-09-01T21:38:28Z) - Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review [17.869642243653985]
Large Language Models (LLMs) are increasingly being integrated into the scientific peer-review process.<n>We investigate the potential for hidden prompt injection attacks, where authors embed adversarial text within a paper's PDF.
arXiv Detail & Related papers (2025-08-28T14:57:04Z) - A Multi-Task Evaluation of LLMs' Processing of Academic Text Input [6.654906601143054]
How much large language models (LLMs) can aid scientific discovery, notably in assisting academic peer review is in heated debate.<n>We organize individual tasks that computer science studies employ in separate terms into a guided and robust workflow to evaluate LLMs' processing of academic text input.<n>We employ four tasks in the assessment: content reproduction/comparison/scoring/reflection, each demanding a specific role of the LLM.
arXiv Detail & Related papers (2025-08-15T19:05:57Z) - TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning [26.680686158061192]
Reasoning is a fundamental capability of large language models (LLMs)
This paper introduces TextGames, a benchmark specifically crafted to assess LLMs through demanding text-based games.
Our findings reveal that although LLMs exhibit proficiency in addressing most easy and medium-level problems, they face significant challenges with more difficult tasks.
arXiv Detail & Related papers (2025-02-25T18:26:48Z) - Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context [49.13497493053742]
We focus on human-readable adversarial prompts, which are more realistic and potent threats.
Our key contributions are (1) situation-driven attacks leveraging movie scripts as context to create human-readable prompts that successfully deceive LLMs, (2) adversarial suffix conversion to transform nonsensical adversarial suffixes into independent meaningful text, and (3) AdvPrompter with p-nucleus sampling, a method to generate diverse, human-readable adversarial suffixes.
arXiv Detail & Related papers (2024-12-20T21:43:52Z) - Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts [53.421616210871704]
Lack of context and unfamiliarity with difficult concepts is a major reason for adult readers' difficulty with domain-specific text.
We introduce "targeted concept simplification," a simplification task for rewriting text to help readers comprehend text containing unfamiliar concepts.
We benchmark the performance of open-source and commercial LLMs and a simple dictionary baseline on this task.
arXiv Detail & Related papers (2024-10-28T05:56:51Z) - ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs [72.13489820420726]
ProSA is a framework designed to evaluate and comprehend prompt sensitivity in large language models.
Our study uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness.
arXiv Detail & Related papers (2024-10-16T09:38:13Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language [41.052284715017606]
We study the ability of Large Language Models (LLMs) to produce persuasive text.
As opposed to prior work which focuses on particular domains or types of persuasion, we conduct a general study across various domains.
We construct the new dataset Persuasive-Pairs of pairs of pairs of a short text and its rewrite by an LLM to amplify or diminish persuasive language.
arXiv Detail & Related papers (2024-06-25T17:40:47Z) - RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models [12.112914393948415]
We present RUPBench, a benchmark designed to evaluate large language models (LLMs) across diverse reasoning tasks.
Our benchmark incorporates 15 reasoning datasets, categorized into commonsense, arithmetic, logical, and knowledge-intensive reasoning.
By examining the performance of state-of-the-art LLMs such as GPT-4o, Llama3, Phi-3, and Gemma on both original and perturbed datasets, we provide a detailed analysis of their robustness and error patterns.
arXiv Detail & Related papers (2024-06-16T17:26:44Z) - From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency [13.154753046052527]
We focus on consistency across languages as well as paraphrases.
We find that the model's multisense consistency is lacking and run several follow-up analyses to verify.
We conclude that, in this aspect, the understanding of LLMs is still quite far from being consistent and human-like.
arXiv Detail & Related papers (2024-04-18T12:48:17Z) - You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments [37.03210795084276]
We examine whether the current format of prompting Large Language Models elicits responses in a consistent and robust manner.
Our experiments on 17 different LLMs reveal that even simple perturbations significantly downgrade a model's question-answering ability.
Our results suggest that the currently widespread practice of prompting is insufficient to accurately and reliably capture model perceptions.
arXiv Detail & Related papers (2023-11-16T09:50:53Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.