Reasoning Robustness of LLMs to Adversarial Typographical Errors
- URL: http://arxiv.org/abs/2411.05345v1
- Date: Fri, 08 Nov 2024 05:54:05 GMT
- Title: Reasoning Robustness of LLMs to Adversarial Typographical Errors
- Authors: Esther Gan, Yiran Zhao, Liying Cheng, Yancan Mao, Anirudh Goyal, Kenji Kawaguchi, Min-Yen Kan, Michael Shieh,
- Abstract summary: Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning using Chain-of-Thought (CoT) prompting.
We study the reasoning robustness of LLMs to typographical errors, which can naturally occur in users' queries.
We design an Adversarial Typo Attack ($texttATA$) algorithm that iteratively samples typos for words that are important to the query and selects the edit that is most likely to succeed in attacking.
- Score: 49.99118660264703
- License:
- Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning using Chain-of-Thought (CoT) prompting. However, CoT can be biased by users' instruction. In this work, we study the reasoning robustness of LLMs to typographical errors, which can naturally occur in users' queries. We design an Adversarial Typo Attack ($\texttt{ATA}$) algorithm that iteratively samples typos for words that are important to the query and selects the edit that is most likely to succeed in attacking. It shows that LLMs are sensitive to minimal adversarial typographical changes. Notably, with 1 character edit, Mistral-7B-Instruct's accuracy drops from 43.7% to 38.6% on GSM8K, while with 8 character edits the performance further drops to 19.2%. To extend our evaluation to larger and closed-source LLMs, we develop the $\texttt{R$^2$ATA}$ benchmark, which assesses models' $\underline{R}$easoning $\underline{R}$obustness to $\underline{\texttt{ATA}}$. It includes adversarial typographical questions derived from three widely used reasoning datasets-GSM8K, BBH, and MMLU-by applying $\texttt{ATA}$ to open-source LLMs. $\texttt{R$^2$ATA}$ demonstrates remarkable transferability and causes notable performance drops across multiple super large and closed-source LLMs.
Related papers
- Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries.
We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT)
LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z) - LLM Robustness Against Misinformation in Biomedical Question Answering [50.98256373698759]
The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering.
We evaluate the effectiveness and robustness of four LLMs against misinformation in answering biomedical questions.
arXiv Detail & Related papers (2024-10-27T16:23:26Z) - Large Language Models Are Overparameterized Text Encoders [17.608805125623803]
Large language models (LLMs) demonstrate strong performance as text embedding models when finetuned with supervised contrastive training.
We show that by pruning the last $p%$ layers of an LLM before supervised training for only 1000 steps, we can achieve a proportional reduction in memory and inference time.
arXiv Detail & Related papers (2024-10-18T16:26:45Z) - LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints [86.59857711385833]
We introduce RealInstruct, the first benchmark designed to evaluate LLMs' ability to follow real-world multi-constrained instructions.
To address the performance gap between open-source and proprietary models, we propose the Decompose, Critique and Refine (DeCRIM) self-correction pipeline.
Our results show that DeCRIM improves Mistral's performance by 7.3% on RealInstruct and 8.0% on IFEval even with weak feedback.
arXiv Detail & Related papers (2024-10-09T01:25:10Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning [31.972053219549757]
TREACLE is a reinforcement learning policy that jointly selects the model and prompting scheme while respecting the user's monetary cost and latency constraints.
Our evaluations show that TREACLE enables cost savings of up to 85% compared to baselines, while maintaining high accuracy.
arXiv Detail & Related papers (2024-04-17T05:56:49Z) - Turbulence: Systematically and Automatically Testing Instruction-Tuned
Large Language Models for Code [12.58098809948832]
We present a method for evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence.
Turbulence consists of a large set of natural language $textitquestion templates$, each of which is a programming problem, parameterised so that it can be asked in many different forms.
From a single question template, it is possible to ask an LLM a $textitneighbourhood$ of very similar programming questions, and assess the correctness of the result returned for each question.
arXiv Detail & Related papers (2023-12-22T17:29:08Z) - Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs)
We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables.
We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z) - Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study [44.39031420687302]
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks.
We try to understand this by designing a benchmark to evaluate the structural understanding capabilities of LLMs.
We propose $textitself-augmentation$ for effective structural prompting, such as critical value / range identification.
arXiv Detail & Related papers (2023-05-22T14:23:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.