Related papers: PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

URL: http://arxiv.org/abs/2306.04528v5
Date: Tue, 16 Jul 2024 07:29:49 GMT
Title: PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Authors: Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Zhenqiang Gong, Xing Xie,
Abstract summary: This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. The adversarial prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our findings demonstrate that contemporary Large Language Models are not robust to adversarial prompts.
Score: 76.18347405302728
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. The adversarial prompts, crafted to mimic plausible user errors like typos or synonyms, aim to evaluate how slight deviations can affect LLM outcomes while maintaining semantic integrity. These prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts. Furthermore, we present a comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users.

Related papers

TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning [26.680686158061192]
Reasoning is a fundamental capability of large language models (LLMs) This paper introduces TextGames, a benchmark specifically crafted to assess LLMs through demanding text-based games. Our findings reveal that although LLMs exhibit proficiency in addressing most easy and medium-level problems, they face significant challenges with more difficult tasks.
arXiv Detail & Related papers (2025-02-25T18:26:48Z)
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context [49.13497493053742]
We focus on human-readable adversarial prompts, which are more realistic and potent threats. Our key contributions are (1) situation-driven attacks leveraging movie scripts as context to create human-readable prompts that successfully deceive LLMs, (2) adversarial suffix conversion to transform nonsensical adversarial suffixes into independent meaningful text, and (3) AdvPrompter with p-nucleus sampling, a method to generate diverse, human-readable adversarial suffixes.
arXiv Detail & Related papers (2024-12-20T21:43:52Z)
Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts [53.421616210871704]
Lack of context and unfamiliarity with difficult concepts is a major reason for adult readers' difficulty with domain-specific text. We introduce "targeted concept simplification," a simplification task for rewriting text to help readers comprehend text containing unfamiliar concepts. We benchmark the performance of open-source and commercial LLMs and a simple dictionary baseline on this task.
arXiv Detail & Related papers (2024-10-28T05:56:51Z)
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs [72.13489820420726]
ProSA is a framework designed to evaluate and comprehend prompt sensitivity in large language models. Our study uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness.
arXiv Detail & Related papers (2024-10-16T09:38:13Z)
Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established. This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt. We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z)
Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language [41.052284715017606]
We study the ability of Large Language Models (LLMs) to produce persuasive text. As opposed to prior work which focuses on particular domains or types of persuasion, we conduct a general study across various domains. We construct the new dataset Persuasive-Pairs of pairs of pairs of a short text and its rewrite by an LLM to amplify or diminish persuasive language.
arXiv Detail & Related papers (2024-06-25T17:40:47Z)
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models [12.112914393948415]
We present RUPBench, a benchmark designed to evaluate large language models (LLMs) across diverse reasoning tasks. Our benchmark incorporates 15 reasoning datasets, categorized into commonsense, arithmetic, logical, and knowledge-intensive reasoning. By examining the performance of state-of-the-art LLMs such as GPT-4o, Llama3, Phi-3, and Gemma on both original and perturbed datasets, we provide a detailed analysis of their robustness and error patterns.
arXiv Detail & Related papers (2024-06-16T17:26:44Z)
From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency [13.154753046052527]
We focus on consistency across languages as well as paraphrases. We find that the model's multisense consistency is lacking and run several follow-up analyses to verify. We conclude that, in this aspect, the understanding of LLMs is still quite far from being consistent and human-like.
arXiv Detail & Related papers (2024-04-18T12:48:17Z)
You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments [37.03210795084276]
We examine whether the current format of prompting Large Language Models elicits responses in a consistent and robust manner. Our experiments on 17 different LLMs reveal that even simple perturbations significantly downgrade a model's question-answering ability. Our results suggest that the currently widespread practice of prompting is insufficient to accurately and reliably capture model perceptions.
arXiv Detail & Related papers (2023-11-16T09:50:53Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. Recent works have proposed algorithms to detect LLM-generated text and protect LLMs. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.