Related papers: Learning Linear Regression with Low-Rank Tasks in-Context

Learning Linear Regression with Low-Rank Tasks in-Context

URL: http://arxiv.org/abs/2510.04548v1
Date: Mon, 06 Oct 2025 07:27:49 GMT
Title: Learning Linear Regression with Low-Rank Tasks in-Context
Authors: Kaito Takanami, Takashi Takahashi, Yoshiyuki Kabashima,
Abstract summary: In-context learning (ICL) is a key building block of modern large language models.<n>We analyze a linear attention model trained on low-rank regression tasks.<n>We find that statistical fluctuations in finite pre-training data induce an implicit regularization.
Score: 8.347662730632047
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common structure. In this work, we address this problem by analyzing a linear attention model trained on low-rank regression tasks. Within this setting, we precisely characterize the distribution of predictions and the generalization error in the high-dimensional limit. Moreover, we find that statistical fluctuations in finite pre-training data induce an implicit regularization. Finally, we identify a sharp phase transition of the generalization error governed by task structure. These results provide a framework for understanding how transformers learn to learn the task structure.

Related papers

Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning [50.99796659680724]
This work investigates out-of-distribution (OOD) generalization in Transformer networks using a GSM8K-style modular arithmetic on computational graphs task as a testbed.<n>We introduce and explore a set of four architectural mechanisms aimed at enhancing OOD generalization.<n>We complement these empirical results with a detailed mechanistic interpretability analysis that reveals how these mechanisms give rise to robust OOD generalization abilities.
arXiv Detail & Related papers (2025-10-15T21:03:59Z)
Pretrain-Test Task Alignment Governs Generalization in In-Context Learning [39.98824138502169]
In this work, we study how the structure of pretraining tasks governs generalization in ICL.<n>Using a solvable model for ICL of linear regression by linear attention, we derive an exact expression for ICL generalization error in high dimensions.<n>We show that this measure directly predicts ICL performance not only in the solvable model but also in nonlinear Transformers.
arXiv Detail & Related papers (2025-09-30T17:19:58Z)
Provable In-Context Learning of Nonlinear Regression with Transformers [66.99048542127768]
In-context learning (ICL) is the ability to perform unseen tasks using task specific prompts without updating parameters.<n>Recent research has actively explored the training dynamics behind ICL, with much of the focus on relatively simple tasks.<n>This paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities.
arXiv Detail & Related papers (2025-07-28T00:09:28Z)
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods [48.038668788625465]
In-context learning (ICL) has achieved remarkable success in natural language and vision domains.<n>In this work, we initiate a theoretical study of ICL for regression of H"older functions on manifold.<n>Our findings provide foundational insights into the role of geometry in ICL and novels tools to study ICL of nonlinear models.
arXiv Detail & Related papers (2025-06-12T17:56:26Z)
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning [50.53703102032562]
Large-scale Transformer language models (LMs) trained solely on next-token prediction with web-scale data can solve a wide range of tasks.<n>The mechanism behind this capability, known as in-context learning (ICL), remains both controversial and poorly understood.
arXiv Detail & Related papers (2025-05-16T08:50:42Z)
Memory-Statistics Tradeoff in Continual Learning with Structural Regularization [27.154013172469853]
We study the statistical performance of a continual learning problem with two linear regression tasks in a well-specified random design setting.<n>We consider a structural regularization algorithm that incorporates a generalized $ell$-regularization tailored to the Hessian of the previous task for mitigating catastrophic forgetting.
arXiv Detail & Related papers (2025-04-05T03:14:10Z)
Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z)
Transformers are Minimax Optimal Nonparametric In-Context Learners [36.291980654891496]
In-context learning of large language models has proven to be a surprisingly effective method of learning a new task from only a few demonstrative examples. We develop approximation and generalization error bounds for a transformer composed of a deep neural network and one linear attention layer. We show that sufficiently trained transformers can achieve -- and even improve upon -- the minimax optimal estimation risk in context.
arXiv Detail & Related papers (2024-08-22T08:02:10Z)
Information Guided Regularization for Fine-tuning Language Models [11.831883526217942]
We argue that a more surgical approach to regularization needs to exist for smoother transfer learning. We devise a novel approach to dropout for improved model regularization and better downstream generalization.
arXiv Detail & Related papers (2024-06-20T05:18:37Z)
Asymptotic theory of in-context learning by linear attention [37.3817914656799]
In-context learning is a cornerstone of Transformers' success.<n>Questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unresolved.
arXiv Detail & Related papers (2024-05-20T03:24:24Z)
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization [111.55277952086155]
We study In-Context Learning (ICL) by addressing several open questions. We show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm. We prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error.
arXiv Detail & Related papers (2023-05-30T21:23:47Z)
Learning Trajectories are Generalization Indicators [44.53518627207067]
This paper explores the connection between learning trajectories of Deep Neural Networks (DNNs) and their generalization capabilities. We present a novel perspective for analyzing generalization error by investigating the contribution of each update step to the change in generalization error. Our approach can also track changes in generalization error when adjustments are made to learning rates and label noise levels.
arXiv Detail & Related papers (2023-04-25T05:08:57Z)
Task-agnostic Continual Learning with Hybrid Probabilistic Models [75.01205414507243]
We propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.
arXiv Detail & Related papers (2021-06-24T05:19:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.