Statistical Foundations of Prior-Data Fitted Networks
- URL: http://arxiv.org/abs/2305.11097v1
- Date: Thu, 18 May 2023 16:34:21 GMT
- Title: Statistical Foundations of Prior-Data Fitted Networks
- Authors: Thomas Nagler
- Abstract summary: Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning.
This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior.
- Score: 0.7614628596146599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior-data fitted networks (PFNs) were recently proposed as a new paradigm
for machine learning. Instead of training the network to an observed training
set, a fixed model is pre-trained offline on small, simulated training sets
from a variety of tasks. The pre-trained model is then used to infer class
probabilities in-context on fresh training sets with arbitrary size and
distribution. Empirically, PFNs achieve state-of-the-art performance on tasks
with similar size to the ones used in pre-training. Surprisingly, their
accuracy further improves when passed larger data sets during inference. This
article establishes a theoretical foundation for PFNs and illuminates the
statistical mechanisms governing their behavior. While PFNs are motivated by
Bayesian ideas, a purely frequentistic interpretation of PFNs as pre-tuned, but
untrained predictors explains their behavior. A predictor's variance vanishes
if its sensitivity to individual training samples does and the bias vanishes
only if it is appropriately localized around the test feature. The transformer
architecture used in current PFN implementations ensures only the former. These
findings shall prove useful for designing architectures with favorable
empirical behavior.
Related papers
- Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning [21.440470901377182]
Initializing with pre-trained models is becoming standard practice in machine learning.
We study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with FedAvg.
arXiv Detail & Related papers (2025-02-11T23:53:16Z) - Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation [39.7344214193566]
We introduce a pioneering test-time adaptation framework tailored for time series forecasting (TSF)
TAFAS, the proposed approach to TSF-TTA, flexibly adapts source forecasters to continuously shifting test distributions while preserving the core semantic information learned during pre-training.
The novel utilization of partially-observed ground truth and gated calibration module enables proactive, robust, and model-agnostic adaptation of source forecasters.
arXiv Detail & Related papers (2025-01-09T04:59:15Z) - Test-Time Alignment via Hypothesis Reweighting [56.71167047381817]
Large pretrained models often struggle with underspecified tasks.
We propose a novel framework to address the challenge of aligning models to test-time user intent.
arXiv Detail & Related papers (2024-12-11T23:02:26Z) - An unfolding method based on conditional Invertible Neural Networks
(cINN) using iterative training [0.0]
Generative networks like invertible neural networks(INN) enable a probabilistic unfolding.
We introduce the iterative conditional INN(IcINN) for unfolding that adjusts for deviations between simulated training samples and data.
arXiv Detail & Related papers (2022-12-16T19:00:05Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.