Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models
- URL: http://arxiv.org/abs/2503.16022v1
- Date: Thu, 20 Mar 2025 10:39:39 GMT
- Title: Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models
- Authors: Mario Sanz-Guerrero, Katharina von der Wense,
- Abstract summary: In-context learning (ICL) has transformed the use of large language models (LLMs) for NLP tasks.<n>Despite its effectiveness, ICL is prone to errors, especially for challenging examples.<n>We propose corrective in-context learning (CICL), an approach that incorporates a model's incorrect predictions alongside ground truth corrections into the prompt.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) has transformed the use of large language models (LLMs) for NLP tasks, enabling few-shot learning by conditioning on labeled examples without finetuning. Despite its effectiveness, ICL is prone to errors, especially for challenging examples. With the goal of improving the performance of ICL, we propose corrective in-context learning (CICL), an approach that incorporates a model's incorrect predictions alongside ground truth corrections into the prompt, aiming to enhance classification accuracy through self-correction. However, contrary to our hypothesis, extensive experiments on text classification tasks demonstrate that CICL consistently underperforms standard ICL, with performance degrading as the proportion of corrections in the prompt increases. Our findings indicate that CICL introduces confusion by disrupting the model's task understanding, rather than refining its predictions. Additionally, we observe that presenting harder examples in standard ICL does not improve performance, suggesting that example difficulty alone may not be a reliable criterion for effective selection. By presenting these negative results, we provide important insights into the limitations of self-corrective mechanisms in LLMs and offer directions for future research.
Related papers
- Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.
We show that the widely used beam search method suffers from unacceptable over-optimism.
We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z) - Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context [13.796664304274643]
We introduce a new framework for quantifying optimality of ICL as a learning algorithm in stylized settings.<n>Our findings reveal a striking dichotomy: while ICL initially matches the efficiency of a Bayes optimal estimator, its efficiency significantly deteriorates in long context.<n>These results clarify the trade-offs in adopting ICL as a universal problem solver, motivating a new generation of on-the-fly adaptive methods.
arXiv Detail & Related papers (2025-02-07T00:26:45Z) - Loss-Aware Curriculum Learning for Chinese Grammatical Error Correction [21.82403446634522]
Chinese grammatical error correction (CGEC) aims to detect and correct errors in the input Chinese sentences.<n>Current approaches ignore that correction difficulty varies across different instances and treat these samples equally.<n>We propose a multi-granularity Curriculum Learning framework to address this problem.
arXiv Detail & Related papers (2024-12-31T08:11:49Z) - Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control.
We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z) - Robust Calibration of Large Vision-Language Adapters [17.583536041845402]
This paper addresses the critical issue of miscalibration in CLIP-based model adaptation.
We empirically demonstrate that popular CLIP adaptation approaches, such as Adapters, Prompt Learning, and Test-Time Adaptation, substantially degrade the calibration capabilities of the zero-shot baseline.
Motivated by these observations, we present a simple and model-agnostic solution to mitigate miscalibration, by scaling the logit range of each sample to its zero-shot prediction logits.
arXiv Detail & Related papers (2024-07-18T15:27:56Z) - Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks.
This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks.
We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z) - On Task Performance and Model Calibration with Supervised and
Self-Ensembled In-Context Learning [71.44986275228747]
In-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs)
However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration)
arXiv Detail & Related papers (2023-12-21T11:55:10Z) - A Study on the Calibration of In-context Learning [27.533223818505682]
We study in-context learning (ICL), a prevalent method for adapting static language models through tailored prompts.
We observe that, with an increasing number of ICL examples, models initially exhibit increased miscalibration before achieving better calibration.
We explore recalibration techniques and find that a scaling-binning calibrator can reduce calibration errors consistently.
arXiv Detail & Related papers (2023-12-07T03:37:39Z) - In-context Learning and Gradient Descent Revisited [3.085927389171139]
We show that even untrained models achieve comparable ICL-GD similarity scores despite not exhibiting ICL.
Next, we explore a major discrepancy in the flow of information throughout the model between ICL and GD, which we term Layer Causality.
We propose a simple GD-based optimization procedure that respects layer causality, and show it improves similarity scores significantly.
arXiv Detail & Related papers (2023-11-13T21:42:38Z) - What and How does In-Context Learning Learn? Bayesian Model Averaging,
Parameterization, and Generalization [111.55277952086155]
We study In-Context Learning (ICL) by addressing several open questions.
We show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm.
We prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error.
arXiv Detail & Related papers (2023-05-30T21:23:47Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.