A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning
- URL: http://arxiv.org/abs/2510.22594v1
- Date: Sun, 26 Oct 2025 09:21:29 GMT
- Title: A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning
- Authors: Bingqing Song, Jiaxiang Li, Rong Wang, Songtao Lu, Mingyi Hong,
- Abstract summary: We propose a new framework to analyze the ICL performance in a class of realistic settings.<n>We derive the precise relationship between ICL performance, context length and the KL divergence between pre-train and query task distribution.
- Score: 52.07397258423034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained large language models have demonstrated a strong ability to learn from context, known as in-context learning (ICL). Despite a surge of recent applications that leverage such capabilities, it is by no means clear, at least theoretically, how the ICL capabilities arise, and in particular, what is the precise role played by key factors such as pre-training procedure as well as context construction. In this work, we propose a new framework to analyze the ICL performance, for a class of realistic settings, which includes network architectures, data encoding, data generation, and prompt construction process. As a first step, we construct a simple example with a one-layer transformer, and show an interesting result, namely when the pre-train data distribution is different from the query task distribution, a properly constructed context can shift the output distribution towards the query task distribution, in a quantifiable manner, leading to accurate prediction on the query topic. We then extend the findings in the previous step to a more general case, and derive the precise relationship between ICL performance, context length and the KL divergence between pre-train and query task distribution. Finally, we provide experiments to validate our theoretical results.
Related papers
- Predicting Task Performance with Context-aware Scaling Laws [56.6850444554434]
We propose a straightforward, interpretable framework that jointly models downstream performance as a function of the training compute and the provided context.<n>We empirically validate our framework by fitting it on the observed downstream performance of extended-context variants of Llama-2-7B and Llama-2-13B.<n>Our results demonstrate that our framework accurately models in-distribution downstream performance, generalizes across three orders of magnitude in training compute, and reliably extrapolates performance as the amount of context increases.
arXiv Detail & Related papers (2025-10-16T17:35:18Z) - How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness [6.723482324209954]
We show how statistical properties of the pretraining distribution shape ICL on numerical tasks.<n>We develop a theoretical framework that unifies task selection and generalization, extending and sharpening earlier results.<n>We then empirically study how ICL performance varies with the pretraining distribution on challenging tasks.
arXiv Detail & Related papers (2025-10-01T17:52:29Z) - Pretrain-Test Task Alignment Governs Generalization in In-Context Learning [39.98824138502169]
In this work, we study how the structure of pretraining tasks governs generalization in ICL.<n>Using a solvable model for ICL of linear regression by linear attention, we derive an exact expression for ICL generalization error in high dimensions.<n>We show that this measure directly predicts ICL performance not only in the solvable model but also in nonlinear Transformers.
arXiv Detail & Related papers (2025-09-30T17:19:58Z) - A Pre-training Framework for Relational Data with Information-theoretic Principles [57.93973948947743]
We introduce Task Vector Estimation (TVE), a novel pre-training framework that constructs supervisory signals via set-based aggregation over relational graphs.<n>TVE consistently outperforms traditional pre-training baselines.<n>Our findings advocate for pre-training objectives that encode task heterogeneity and temporal structure as design principles for predictive modeling on relational databases.
arXiv Detail & Related papers (2025-07-14T00:17:21Z) - Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.<n>We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.<n>We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Understanding In-Context Learning via Supportive Pretraining Data [55.648777340129364]
In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time.
It is not well understood why ICL ability emerges, as the model has never been specifically trained on such demonstrations.
Our work takes a first step towards understanding ICL via analyzing instance-level pretraining data.
arXiv Detail & Related papers (2023-06-26T22:14:04Z) - A Theory of Emergent In-Context Learning as Implicit Structure Induction [8.17811111226145]
Scaling large language models leads to an emergent capacity to learn in-context from example demonstrations.
We argue that in-context learning relies on recombination of compositional operations found in natural language data.
We show how in-context learning is supported by a representation of the input's compositional structure.
arXiv Detail & Related papers (2023-03-14T15:24:05Z) - Compositional Exemplars for In-context Learning [21.961094715261133]
Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability.
We propose CEIL (Compositional Exemplars for In-context Learning) to model the interaction between the given input and in-context examples.
We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing.
arXiv Detail & Related papers (2023-02-11T14:02:08Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.