Inference and Verbalization Functions During In-Context Learning
- URL: http://arxiv.org/abs/2410.09349v1
- Date: Sat, 12 Oct 2024 03:31:37 GMT
- Title: Inference and Verbalization Functions During In-Context Learning
- Authors: Junyi Tao, Xiaoyin Chen, Nelson F. Liu,
- Abstract summary: Large language models (LMs) are capable of in-context learning from a few demonstrations to solve new tasks during inference.
Previous work has observed that, in some settings, ICL performance is minimally affected by irrelevant labels.
We hypothesize that LMs perform ICL with irrelevant labels via two sequential processes: an inference function that solves the task, followed by a verbalization function that maps the inferred answer to the label space.
- Score: 7.544880309193842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LMs) are capable of in-context learning from a few demonstrations (example-label pairs) to solve new tasks during inference. Despite the intuitive importance of high-quality demonstrations, previous work has observed that, in some settings, ICL performance is minimally affected by irrelevant labels (Min et al., 2022). We hypothesize that LMs perform ICL with irrelevant labels via two sequential processes: an inference function that solves the task, followed by a verbalization function that maps the inferred answer to the label space. Importantly, we hypothesize that the inference function is invariant to remappings of the label space (e.g., "true"/"false" to "cat"/"dog"), enabling LMs to share the same inference function across settings with different label words. We empirically validate this hypothesis with controlled layer-wise interchange intervention experiments. Our findings confirm the hypotheses on multiple datasets and tasks (natural language inference, sentiment analysis, and topic classification) and further suggest that the two functions can be localized in specific layers across various open-sourced models, including GEMMA-7B, MISTRAL-7B-V0.3, GEMMA-2-27B, and LLAMA-3.1-70B.
Related papers
- Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation [0.09558392439655014]
We assess whether pre-trained autoregressive LLMs possess consistent, expressible knowledge about thematic fit.
We evaluate both closed and open state-of-the-art LLMs on several psycholinguistic datasets.
Our results show that chain-of-thought reasoning is more effective on datasets with self-explanatory semantic role labels.
arXiv Detail & Related papers (2024-10-19T18:25:30Z) - Rectifying Demonstration Shortcut in In-Context Learning [15.08431909212102]
Large language models (LLMs) are able to solve various tasks with only a few demonstrations utilizing their in-context learning (ICL) abilities.
LLMs often rely on their pre-trained semantic priors of demonstrations rather than on the input-label relationships to proceed with ICL prediction.
arXiv Detail & Related papers (2024-03-14T15:30:14Z) - Test-Time Self-Adaptive Small Language Models for Question Answering [63.91013329169796]
We show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data.
Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets.
arXiv Detail & Related papers (2023-10-20T06:49:32Z) - Label Words are Anchors: An Information Flow Perspective for
Understanding In-Context Learning [77.7070536959126]
In-context learning (ICL) emerges as a promising capability of large language models (LLMs)
In this paper, we investigate the working mechanism of ICL through an information flow lens.
We introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL.
arXiv Detail & Related papers (2023-05-23T15:26:20Z) - Influence Functions for Sequence Tagging Models [49.81774968547377]
We extend influence functions to trace predictions back to the training points that informed them.
We show the practical utility of segment influence by using the method to identify systematic annotation errors.
arXiv Detail & Related papers (2022-10-25T17:13:11Z) - SepLL: Separating Latent Class Labels from Weak Supervision Noise [4.730767228515796]
In weakly supervised learning, labeling functions automatically assign, often noisy, labels to data samples.
In this work, we provide a method for learning from weak labels by separating two types of complementary information.
Our model is competitive with the state-of-the-art, and yields a new best average performance.
arXiv Detail & Related papers (2022-10-25T10:33:45Z) - A Multi-level Supervised Contrastive Learning Framework for Low-Resource
Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding.
Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z) - Rethinking the Role of Demonstrations: What Makes In-Context Learning
Work? [112.72413411257662]
Large language models (LMs) are able to in-context learn by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.
We show that ground truth demonstrations are in fact not required -- randomly replacing labels in the demonstrations barely hurts performance.
We find that other aspects of the demonstrations are the key drivers of end task performance.
arXiv Detail & Related papers (2022-02-25T17:25:19Z) - Comparing Text Representations: A Theory-Driven Approach [2.893558866535708]
We adapt general tools from computational learning theory to fit the specific characteristics of text datasets.
We present a method to evaluate the compatibility between representations and tasks.
This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task.
arXiv Detail & Related papers (2021-09-15T17:48:19Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.