Related papers: Analyzing Generalization in Pre-Trained Symbolic Regression

Analyzing Generalization in Pre-Trained Symbolic Regression

URL: http://arxiv.org/abs/2509.19849v1
Date: Wed, 24 Sep 2025 07:47:02 GMT
Title: Analyzing Generalization in Pre-Trained Symbolic Regression
Authors: Henrik Voigt, Paul Kahlmeyer, Kai Lawonn, Michael Habeck, Joachim Giesen,
Abstract summary: Symbolic regression algorithms search a space of mathematical expressions for formulas that explain given data.<n> Transformer-based models have emerged as a promising, promising approach shifting the expensive search to a large-scale pre-training phase.
Score: 17.789199791229624
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Symbolic regression algorithms search a space of mathematical expressions for formulas that explain given data. Transformer-based models have emerged as a promising, scalable approach shifting the expensive combinatorial search to a large-scale pre-training phase. However, the success of these models is critically dependent on their pre-training data. Their ability to generalize to problems outside of this pre-training distribution remains largely unexplored. In this work, we conduct a systematic empirical study to evaluate the generalization capabilities of pre-trained, transformer-based symbolic regression. We rigorously test performance both within the pre-training distribution and on a series of out-of-distribution challenges for several state of the art approaches. Our findings reveal a significant dichotomy: while pre-trained models perform well in-distribution, the performance consistently degrades in out-of-distribution scenarios. We conclude that this generalization gap is a critical barrier for practitioners, as it severely limits the practical use of pre-trained approaches for real-world applications.

Related papers

The Coverage Principle: How Pre-Training Enables Post-Training [70.25788947586297]
We study how pre-training shapes the success of the final model.<n>We uncover a mechanism that explains the power of coverage in predicting downstream performance.
arXiv Detail & Related papers (2025-10-16T17:53:50Z)
Multiply Robust Conformal Risk Control with Coarsened Data [0.0]
Conformal Prediction (CP) has recently received a tremendous amount of interest.<n>In this paper, we consider the general problem of obtaining distribution-free valid prediction regions for an outcome given coarsened data.<n>Our principled use of semiparametric theory has the key advantage of facilitating flexible machine learning methods.
arXiv Detail & Related papers (2025-08-21T12:14:44Z)
A Classical View on Benign Overfitting: The Role of Sample Size [14.36840959836957]
We focus on almost benign overfitting, where models simultaneously achieve both arbitrarily small training and test errors.<n>This behavior is characteristic of neural networks, which often achieve low (but non-zero) training error while still generalizing well.
arXiv Detail & Related papers (2025-05-16T18:37:51Z)
In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z)
Histogram Approaches for Imbalanced Data Streams Regression [1.8385275253826225]
Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression.<n>This study introduces histogram-based sampling strategies to overcome this constraint.<n> Comprehensive experiments on synthetic and real-world benchmarks demonstrate that HistUS and HistOS substantially improve rare-case prediction accuracy.
arXiv Detail & Related papers (2025-01-29T11:03:02Z)
On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations. We propose an autoregressive sampling approach that significantly improves performance in forecasting. We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z)
Generalizing to any diverse distribution: uniformity, gentle finetuning and rebalancing [55.791818510796645]
We aim to develop models that generalize well to any diverse test distribution, even if the latter deviates significantly from the training data. Various approaches like domain adaptation, domain generalization, and robust optimization attempt to address the out-of-distribution challenge. We adopt a more conservative perspective by accounting for the worst-case error across all sufficiently diverse test distributions within a known domain.
arXiv Detail & Related papers (2024-10-08T12:26:48Z)
Universality in Transfer Learning for Linear Models [18.427215139020625]
We study the problem of transfer learning and fine-tuning in linear models for both regression and binary classification.<n>In particular, we consider the use of gradient descent (SGD) on a linear model with pretrained weights and using a small training data set from the target distribution.
arXiv Detail & Related papers (2024-10-03T03:09:09Z)
On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase. Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z)
It's All in the Mix: Wasserstein Classification and Regression with Mixed Features [5.106912532044251]
We develop and analyze distributionally robust prediction models that faithfully account for the presence of discrete features.<n>We demonstrate that our models can significantly outperform existing methods that are agnostic to the presence of discrete features.
arXiv Detail & Related papers (2023-12-19T15:15:52Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Semantic Self-adaptation: Enhancing Generalization with a Single Sample [45.111358665370524]
We propose a self-adaptive approach for semantic segmentation. It fine-tunes the parameters of convolutional layers to the input image using consistency regularization. Our empirical study suggests that self-adaptation may complement the established practice of model regularization at training time.
arXiv Detail & Related papers (2022-08-10T12:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.