In-Context Function Learning in Large Language Models
- URL: http://arxiv.org/abs/2602.11863v1
- Date: Thu, 12 Feb 2026 12:09:48 GMT
- Title: In-Context Function Learning in Large Language Models
- Authors: Elif Akata, Konstantinos Voudouris, Vincent Fortuin, Eric Schulz,
- Abstract summary: Large language models (LLMs) can learn from a few demonstrations provided at inference time.<n>We study this in-context learning phenomenon through the lens of Gaussian Processes (GPs)<n>We find that LLM learning curves are strongly influenced by the function-generating kernels and approach the GP lower bound as the number of demonstrations increases.
- Score: 19.618773481188626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) can learn from a few demonstrations provided at inference time. We study this in-context learning phenomenon through the lens of Gaussian Processes (GPs). We build controlled experiments where models observe sequences of multivariate scalar-valued function samples drawn from known GP priors. We evaluate prediction error in relation to the number of demonstrations and compare against two principled references: (i) an empirical GP-regression learner that gives a lower bound on achievable error, and (ii) the expected error of a 1-nearest-neighbor (1-NN) rule, which gives a data-driven upper bound. Across model sizes, we find that LLM learning curves are strongly influenced by the function-generating kernels and approach the GP lower bound as the number of demonstrations increases. We then study the inductive biases of these models using a likelihood-based analysis. We find that LLM predictions are most likely under less smooth GP kernels. Finally, we explore whether post-training can shift these inductive biases and improve sample-efficiency on functions sampled from GPs with smoother kernels. We find that both reinforcement learning and supervised fine-tuning can effectively shift inductive biases in the direction of the training data. Together, our framework quantifies the extent to which LLMs behave like GP learners and provides tools for steering their inductive biases for continuous function learning tasks.
Related papers
- Empirical Gaussian Processes [18.40952262882312]
Empirical GPs are a principled framework for constructing flexible, data-driven GP priors.<n>We show that Empirical GPs achieve competitive performance on learning curve extrapolation and time series forecasting benchmarks.
arXiv Detail & Related papers (2026-02-12T15:39:08Z) - Self-Supervised Learning with Gaussian Processes [0.9058737915650011]
Self supervised learning (SSL) is a machine learning paradigm where models learn to understand the underlying structure of data without explicit supervision from labeled samples.<n>To ensure smoothness of the representation space, most SSL methods rely on the ability to generate pairs of observations that are similar to a given instance.<n>We show that GPSSL is closely related to both kernel PCA and VICReg, a popular neural network-based SSL method, but unlike both allows for posterior uncertainties that can be propagated to downstream tasks.
arXiv Detail & Related papers (2025-12-10T05:10:40Z) - Approaching Maximal Information Extraction in Low-Signal Regimes via Multiple Instance Learning [0.0]
We propose a new machine learning (ML) methodology to obtain more precise predictions.<n>We show that it might be possible to extract the theoretical maximum Fisher Information latent in a dataset.
arXiv Detail & Related papers (2025-08-09T22:46:55Z) - Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z) - An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning [70.48605869773814]
Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information.<n>This study empirically evaluates the forgetting phenomenon in large language models during continual instruction tuning.
arXiv Detail & Related papers (2023-08-17T02:53:23Z) - Understanding Emergent In-Context Learning from a Kernel Regression Perspective [55.95455089638838]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.<n>This paper proposes a kernel-regression perspective of understanding LLMs' ICL bahaviors when faced with in-context examples.<n>We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Locally Interpretable Model Agnostic Explanations using Gaussian
Processes [2.9189409618561966]
Local Interpretable Model-Agnostic Explanations (LIME) is a popular technique for explaining the prediction of a single instance.
We propose a Gaussian Process (GP) based variation of locally interpretable models.
We demonstrate that the proposed technique is able to generate faithful explanations using much fewer samples as compared to LIME.
arXiv Detail & Related papers (2021-08-16T05:49:01Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.