The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs
- URL: http://arxiv.org/abs/2502.04134v1
- Date: Thu, 06 Feb 2025 15:14:02 GMT
- Title: The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs
- Authors: Bryan Guan, Tanya Roosta, Peyman Passban, Mehdi Rezagholizadeh,
- Abstract summary: This paper investigates the extent of order sensitivity in large language models (LLMs)
Our results show that input order significantly affects performance across tasks, with shuffled inputs leading to measurable declines in output accuracy.
Few-shot prompting demonstrates mixed effectiveness and offers partial mitigation, however, fails to fully resolve the problem.
- Score: 19.798249518847694
- License:
- Abstract: As large language models (LLMs) become integral to diverse applications, ensuring their reliability under varying input conditions is crucial. One key issue affecting this reliability is order sensitivity, wherein slight variations in input arrangement can lead to inconsistent or biased outputs. Although recent advances have reduced this sensitivity, the problem remains unresolved. This paper investigates the extent of order sensitivity in closed-source LLMs by conducting experiments across multiple tasks, including paraphrasing, relevance judgment, and multiple-choice questions. Our results show that input order significantly affects performance across tasks, with shuffled inputs leading to measurable declines in output accuracy. Few-shot prompting demonstrates mixed effectiveness and offers partial mitigation, however, fails to fully resolve the problem. These findings highlight persistent risks, particularly in high-stakes applications, and point to the need for more robust LLMs or improved input-handling techniques in future development.
Related papers
- Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.
Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.
We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - Active Task Disambiguation with LLMs [48.54945212561785]
We introduce a formal definition of task ambiguity and frame the problem of task disambiguation through the lens of Bayesian Experimental Design.
Our proposed approach of active task disambiguation enables LLM agents to generate targeted questions maximizing the information gain.
Empirical results demonstrate that this form of question selection leads to more effective task disambiguation in comparison to approaches relying on reasoning solely within the space of questions.
arXiv Detail & Related papers (2025-02-06T20:20:22Z) - Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.
One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.
We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z) - ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs [72.13489820420726]
ProSA is a framework designed to evaluate and comprehend prompt sensitivity in large language models.
Our study uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness.
arXiv Detail & Related papers (2024-10-16T09:38:13Z) - How Susceptible are LLMs to Influence in Prompts? [6.644673474240519]
Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein.
We study how an LLM's response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model.
Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation.
arXiv Detail & Related papers (2024-08-17T17:40:52Z) - On the Worst Prompt Performance of Large Language Models [93.13542053835542]
Performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts.
We introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries.
Experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance.
arXiv Detail & Related papers (2024-06-08T13:40:38Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - RT-LM: Uncertainty-Aware Resource Management for Real-Time Inference of
Language Models [12.947537874888717]
varied inference latency, identified as a consequence of uncertainty intrinsic to the nature of language, can lead to computational inefficiency.
We present RT-LM, an uncertainty-aware resource management ecosystem for real-time inference of LMs.
We show that RT-LM can significantly reduce the average response time and improve throughput while incurring a rather small runtime overhead.
arXiv Detail & Related papers (2023-09-12T22:22:10Z) - Uncertainty Injection: A Deep Learning Method for Robust Optimization [16.13344685457395]
This paper proposes a paradigm of uncertainty injection for training deep learning model to solve robust optimization problems.
We identify the wireless communications as an application field where uncertainties are prevalent in problem parameters.
We show the effectiveness of the proposed training scheme in two applications.
arXiv Detail & Related papers (2023-02-23T19:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.