Related papers: From Text to Trajectories: GPT-2 as an ODE Solver via In-Context

From Text to Trajectories: GPT-2 as an ODE Solver via In-Context

URL: http://arxiv.org/abs/2508.03031v1
Date: Tue, 05 Aug 2025 03:16:37 GMT
Title: From Text to Trajectories: GPT-2 as an ODE Solver via In-Context
Authors: Ziyang Ma, Baojian Zhou, Deqing Yang, Yanghua Xiao,
Abstract summary: In-Context Learning (ICL) has emerged as a new paradigm in large language models (LLMs)<n>This paper investigates whether LLMs can solve ordinary differential equations (ODEs) under the ICL setting.<n> Experiments on two types of ODEs show that GPT-2 can effectively learn a meta-ODE algorithm, with convergence behavior comparable to, or better than, the Euler method.
Score: 44.198609457344574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In-Context Learning (ICL) has emerged as a new paradigm in large language models (LLMs), enabling them to perform novel tasks by conditioning on a few examples embedded in the prompt. Yet, the highly nonlinear behavior of ICL for NLP tasks remains poorly understood. To shed light on its underlying mechanisms, this paper investigates whether LLMs can solve ordinary differential equations (ODEs) under the ICL setting. We formulate standard ODE problems and their solutions as sequential prompts and evaluate GPT-2 models on these tasks. Experiments on two types of ODEs show that GPT-2 can effectively learn a meta-ODE algorithm, with convergence behavior comparable to, or better than, the Euler method, and achieve exponential accuracy gains with increasing numbers of demonstrations. Moreover, the model generalizes to out-of-distribution (OOD) problems, demonstrating robust extrapolation capabilities. These empirical findings provide new insights into the mechanisms of ICL in NLP and its potential for solving nonlinear numerical problems.

Related papers

Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks [48.105258051884384]
This paper proposes a new two-stage training framework that enhances models' self-correction capabilities.<n>During the first stage, a multi-turn dialogue strategy guides the model to generate long chain-of-thought (CoT) data.<n>The second stage employs a difficulty-aware rejection sampling mechanism to dynamically optimize data distribution.
arXiv Detail & Related papers (2026-01-09T08:19:11Z)
SciML Agents: Write the Solver, Not the Solution [69.5021018644143]
We introduce two new datasets: a diagnostic dataset of adversarial "misleading" problems; and a large-scale benchmark of 1,000 diverse ODE tasks.<n>We evaluate open- and closed-source LLM models along two axes: (i) unguided versus guided prompting with domain-specific knowledge; and (ii) off-the-shelf versus fine-tuned variants.<n>Preliminary results indicate that careful prompting and fine-tuning can yield a specialized LLM agent capable of reliably solving simple ODE problems.
arXiv Detail & Related papers (2025-09-12T02:53:57Z)
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z)
Can a Large Language Model Learn Matrix Functions In Context? [3.7478782183628634]
Large Language Models (LLMs) have demonstrated the ability to solve complex tasks through In-Context Learning (ICL) This paper explores the capacity of LLMs to solve non-linear numerical computations, with specific emphasis on functions of the Singular Value Decomposition.
arXiv Detail & Related papers (2024-11-24T00:33:43Z)
Re-examining learning linear functions in context [1.8843687952462742]
In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks.<n>We explore a simple model of ICL in a controlled setup with synthetic training data.<n>Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches to learn a linear function in-context.
arXiv Detail & Related papers (2024-11-18T10:58:46Z)
LLM4ED: Large Language Models for Automatic Equation Discovery [0.8644909837301149]
We introduce a new framework that utilizes natural language-based prompts to guide large language models in automatically mining governing equations from data. Specifically, we first utilize the generation capability of LLMs to generate diverse equations in string form, and then evaluate the generated equations based on observations. Experiments are extensively conducted on both partial differential equations and ordinary differential equations.
arXiv Detail & Related papers (2024-05-13T14:03:49Z)
Near-Optimal Solutions of Constrained Learning Problems [85.48853063302764]
In machine learning systems, the need to curtail their behavior has become increasingly apparent. This is evidenced by recent advancements towards developing models that satisfy dual robustness variables. Our results show that rich parametrizations effectively mitigate non-dimensional, finite learning problems.
arXiv Detail & Related papers (2024-03-18T14:55:45Z)
Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. CMDPs serve as an important framework to model many real-world applications with time-varying environments. We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples [46.025105938192624]
Vision Transformer (ViT) has performed remarkably in various computer vision tasks. ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. We propose a novel model binarization technique, called Group Superposition Binarization (GSB)
arXiv Detail & Related papers (2023-05-13T14:48:09Z)
Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations [5.33024001730262]
LODE-GPs follow a system of linear homogeneous ODEs with constant coefficients. We show the effectiveness of LODE-GPs in a number of experiments.
arXiv Detail & Related papers (2022-08-26T09:16:53Z)
Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK) Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework. We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.