Two in context learning tasks with complex functions
- URL: http://arxiv.org/abs/2502.03503v1
- Date: Wed, 05 Feb 2025 11:03:36 GMT
- Title: Two in context learning tasks with complex functions
- Authors: Omar Naim, Nicholas Asher,
- Abstract summary: We examine two in context learning (ICL) tasks with mathematical functions in several train and test settings for transformer models.
Our study generalizes work on linear functions by showing that small transformers, even models with attention layers only can approximate arbitrary functions.
Our models also can approximate previously unseen classes of functions, as well as the zeros of complex functions.
- Score: 2.1178416840822027
- License:
- Abstract: We examine two in context learning (ICL) tasks with mathematical functions in several train and test settings for transformer models. Our study generalizes work on linear functions by showing that small transformers, even models with attention layers only, can approximate arbitrary polynomial functions and hence continuous functions under certain conditions. Our models also can approximate previously unseen classes of polynomial functions, as well as the zeros of complex functions. Our models perform far better on this task than LLMs like GPT4 and involve complex reasoning when provided with suitable training data and methods. Our models also have important limitations; they fail to generalize outside of training distributions and so don't learn class forms of functions. We explain why this is so.
Related papers
- Re-examining learning linear functions in context [1.8843687952462742]
In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks.
We explore a simple model of ICL in a controlled setup with synthetic training data.
Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches to learn a linear function in-context.
arXiv Detail & Related papers (2024-11-18T10:58:46Z) - In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference.
This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z) - Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks [5.358878931933351]
We study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks.
Specifically, we consider a finite collection of linear modular functions $z = a, x + b, y ;mathrmmod; p$ labeled by the vector $(a, b) in mathbbZ_p2$.
arXiv Detail & Related papers (2024-06-04T17:59:36Z) - Piecewise Polynomial Regression of Tame Functions via Integer Programming [2.2499166814992435]
We consider tame functions, nonsmooth functions with all common activations, value functions of mixed-integer programs, or wave functions of small molecules.
arXiv Detail & Related papers (2023-11-22T17:37:42Z) - Understanding In-Context Learning in Transformers and LLMs by Learning
to Learn Discrete Functions [32.59746882017483]
We show that Transformers can learn to implement two distinct algorithms to solve a single task.
We also show that extant Large Language Models (LLMs) can compete with nearest-neighbor baselines on prediction tasks.
arXiv Detail & Related papers (2023-10-04T17:57:33Z) - What Can Transformers Learn In-Context? A Case Study of Simple Function
Classes [67.06980111346245]
In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples.
We show that standard Transformers can be trained from scratch to perform in-context learning of linear functions.
We also show that we can train Transformers to in-context learn more complex function classes with performance that matches or exceeds task-specific learning algorithms.
arXiv Detail & Related papers (2022-08-01T18:01:40Z) - Bilinear Classes: A Structural Framework for Provable Generalization in
RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning.
The framework incorporates nearly all existing models in which a sample complexity is achievable.
Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z) - Learning outside the Black-Box: The pursuit of interpretable models [78.32475359554395]
This paper proposes an algorithm that produces a continuous global interpretation of any given continuous black-box function.
Our interpretation represents a leap forward from the previous state of the art.
arXiv Detail & Related papers (2020-11-17T12:39:44Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - From Sets to Multisets: Provable Variational Inference for Probabilistic
Integer Submodular Models [82.95892656532696]
Submodular functions have been studied extensively in machine learning and data mining.
In this work, we propose a continuous DR-submodular extension for integer submodular functions.
We formulate a new probabilistic model which is defined through integer submodular functions.
arXiv Detail & Related papers (2020-06-01T22:20:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.