Related papers: The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs

The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs

URL: http://arxiv.org/abs/2502.04134v1
Date: Thu, 06 Feb 2025 15:14:02 GMT
Title: The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs
Authors: Bryan Guan, Tanya Roosta, Peyman Passban, Mehdi Rezagholizadeh,
Abstract summary: This paper investigates the extent of order sensitivity in large language models (LLMs)<n>Our results show that input order significantly affects performance across tasks, with shuffled inputs leading to measurable declines in output accuracy.<n>Few-shot prompting demonstrates mixed effectiveness and offers partial mitigation, however, fails to fully resolve the problem.
Score: 19.798249518847694
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As large language models (LLMs) become integral to diverse applications, ensuring their reliability under varying input conditions is crucial. One key issue affecting this reliability is order sensitivity, wherein slight variations in input arrangement can lead to inconsistent or biased outputs. Although recent advances have reduced this sensitivity, the problem remains unresolved. This paper investigates the extent of order sensitivity in closed-source LLMs by conducting experiments across multiple tasks, including paraphrasing, relevance judgment, and multiple-choice questions. Our results show that input order significantly affects performance across tasks, with shuffled inputs leading to measurable declines in output accuracy. Few-shot prompting demonstrates mixed effectiveness and offers partial mitigation, however, fails to fully resolve the problem. These findings highlight persistent risks, particularly in high-stakes applications, and point to the need for more robust LLMs or improved input-handling techniques in future development.

Related papers

MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness [1.5601146551243388]
We introduce a novel method, Multiple Answers and Confidence Stepwise Tuning (MAC-Tuning), that separates the learning of answer prediction and confidence estimation during fine-tuning on instruction data. Our method outperforms baselines by up to 25% in average precision.
arXiv Detail & Related papers (2025-04-30T16:17:53Z)
Exploring LLM Reasoning Through Controlled Prompt Variations [0.9217021281095907]
We evaluate how well state-of-the-art models maintain logical consistency and correctness when confronted with four categories of prompt perturbations. Our experiments, conducted on thirteen open-source and closed-source LLMs, reveal that introducing irrelevant context within the model's context window significantly degrades performance. Certain perturbations inadvertently trigger chain-of-thought-like reasoning behaviors, even without explicit prompting.
arXiv Detail & Related papers (2025-04-02T20:18:50Z)
Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach [33.331269103351815]
We propose Dynamic Prompt Corruption (DPC) to take better advantage of soft prompts in complex reasoning tasks. First, Dynamic Trigger measures the impact of soft prompts, identifying whether beneficial or detrimental. Then, Dynamic Corruption mitigates the negative effects of soft prompts by selectively masking key tokens that interfere with the reasoning process.
arXiv Detail & Related papers (2025-03-17T14:20:48Z)
Active Task Disambiguation with LLMs [48.54945212561785]
We introduce a formal definition of task ambiguity and frame the problem of task disambiguation through the lens of Bayesian Experimental Design. Our proposed approach of active task disambiguation enables LLM agents to generate targeted questions maximizing the information gain. Empirical results demonstrate that this form of question selection leads to more effective task disambiguation in comparison to approaches relying on reasoning solely within the space of questions.
arXiv Detail & Related papers (2025-02-06T20:20:22Z)
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices. One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM. We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z)
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs [72.13489820420726]
ProSA is a framework designed to evaluate and comprehend prompt sensitivity in large language models. Our study uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness.
arXiv Detail & Related papers (2024-10-16T09:38:13Z)
How Susceptible are LLMs to Influence in Prompts? [6.644673474240519]
Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. We study how an LLM's response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation.
arXiv Detail & Related papers (2024-08-17T17:40:52Z)
On the Worst Prompt Performance of Large Language Models [93.13542053835542]
Performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts. We introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries. Experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance.
arXiv Detail & Related papers (2024-06-08T13:40:38Z)
On the Vulnerability of LLM/VLM-Controlled Robotics [54.57914943017522]
We highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities. Our results show that simple input perturbations reduce task execution success rates by 22.2% and 14.6% in two representative LLM/VLM-controlled robotic systems.
arXiv Detail & Related papers (2024-02-15T22:01:45Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
RT-LM: Uncertainty-Aware Resource Management for Real-Time Inference of Language Models [12.947537874888717]
varied inference latency, identified as a consequence of uncertainty intrinsic to the nature of language, can lead to computational inefficiency. We present RT-LM, an uncertainty-aware resource management ecosystem for real-time inference of LMs. We show that RT-LM can significantly reduce the average response time and improve throughput while incurring a rather small runtime overhead.
arXiv Detail & Related papers (2023-09-12T22:22:10Z)
Uncertainty Injection: A Deep Learning Method for Robust Optimization [16.13344685457395]
This paper proposes a paradigm of uncertainty injection for training deep learning model to solve robust optimization problems. We identify the wireless communications as an application field where uncertainties are prevalent in problem parameters. We show the effectiveness of the proposed training scheme in two applications.
arXiv Detail & Related papers (2023-02-23T19:59:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.