On the Relation between Sensitivity and Accuracy in In-context Learning
- URL: http://arxiv.org/abs/2209.07661v3
- Date: Sat, 27 Jan 2024 08:07:34 GMT
- Title: On the Relation between Sensitivity and Accuracy in In-context Learning
- Authors: Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He
- Abstract summary: In-context learning (ICL) suffers from oversensitivity to the prompt, making it unreliable in real-world scenarios.
We study the sensitivity of ICL with respect to multiple perturbation types.
We propose textscSenSel, a few-shot selective prediction method that abstains from sensitive predictions.
- Score: 41.27837171531926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-context learning (ICL) suffers from oversensitivity to the prompt, making
it unreliable in real-world scenarios. We study the sensitivity of ICL with
respect to multiple perturbation types. First, we find that label bias obscures
the true sensitivity, and therefore prior work may have significantly
underestimated ICL sensitivity. Second, we observe a strong negative
correlation between ICL sensitivity and accuracy: predictions sensitive to
perturbations are less likely to be correct. Motivated by these findings, we
propose \textsc{SenSel}, a few-shot selective prediction method that abstains
from sensitive predictions. Experiments on ten classification datasets show
that \textsc{SenSel} consistently outperforms two commonly used
confidence-based and entropy-based baselines on abstention decisions.
Related papers
- ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs [72.13489820420726]
ProSA is a framework designed to evaluate and comprehend prompt sensitivity in large language models.
Our study uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness.
arXiv Detail & Related papers (2024-10-16T09:38:13Z) - A Neural Framework for Generalized Causal Sensitivity Analysis [78.71545648682705]
We propose NeuralCSA, a neural framework for causal sensitivity analysis.
We provide theoretical guarantees that NeuralCSA is able to infer valid bounds on the causal query of interest.
arXiv Detail & Related papers (2023-11-27T17:40:02Z) - How are Prompts Different in Terms of Sensitivity? [50.67313477651395]
We present a comprehensive prompt analysis based on the sensitivity of a function.
We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output.
We introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding.
arXiv Detail & Related papers (2023-11-13T10:52:01Z) - The Memory Perturbation Equation: Understanding Model's Sensitivity to
Data [16.98312108418346]
We present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data.
Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data.
arXiv Detail & Related papers (2023-10-30T05:12:24Z) - Sharp Bounds for Generalized Causal Sensitivity Analysis [30.77874108094485]
We propose a unified framework for causal sensitivity analysis under unobserved confounding.
This includes (conditional) average treatment effects, effects for mediation analysis and path analysis, and distributional effects.
Our bounds for (conditional) average treatment effects coincide with recent optimality results for causal sensitivity analysis.
arXiv Detail & Related papers (2023-05-26T14:44:32Z) - Language Model Classifier Aligns Better with Physician Word Sensitivity
than XGBoost on Readmission Prediction [86.15787587540132]
We introduce sensitivity score, a metric that scrutinizes models' behaviors at the vocabulary level.
Our experiments compare the decision-making logic of clinicians and classifiers based on rank correlations of sensitivity scores.
arXiv Detail & Related papers (2022-11-13T23:59:11Z) - Balancing Robustness and Sensitivity using Feature Contrastive Learning [95.86909855412601]
Methods that promote robustness can hurt the model's sensitivity to rare or underrepresented patterns.
We propose Feature Contrastive Learning (FCL) that encourages a model to be more sensitive to the features that have higher contextual utility.
arXiv Detail & Related papers (2021-05-19T20:53:02Z) - Sensitivity as a Complexity Measure for Sequence Classification Tasks [24.246784593571626]
We argue that standard sequence classification methods are biased towards learning low-sensitivity functions, so that tasks requiring high sensitivity are more difficult.
We estimate sensitivity on 15 NLP tasks, finding that sensitivity is higher on challenging tasks collected in GLUE than on simple text classification tasks.
arXiv Detail & Related papers (2021-04-21T03:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.