Related papers: Statistical Foundations of Prior-Data Fitted Networks

Statistical Foundations of Prior-Data Fitted Networks

URL: http://arxiv.org/abs/2305.11097v1
Date: Thu, 18 May 2023 16:34:21 GMT
Title: Statistical Foundations of Prior-Data Fitted Networks
Authors: Thomas Nagler
Abstract summary: Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning. This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior.
Score: 0.7614628596146599
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning. Instead of training the network to an observed training set, a fixed model is pre-trained offline on small, simulated training sets from a variety of tasks. The pre-trained model is then used to infer class probabilities in-context on fresh training sets with arbitrary size and distribution. Empirically, PFNs achieve state-of-the-art performance on tasks with similar size to the ones used in pre-training. Surprisingly, their accuracy further improves when passed larger data sets during inference. This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior. While PFNs are motivated by Bayesian ideas, a purely frequentistic interpretation of PFNs as pre-tuned, but untrained predictors explains their behavior. A predictor's variance vanishes if its sensitivity to individual training samples does and the bias vanishes only if it is appropriately localized around the test feature. The transformer architecture used in current PFN implementations ensures only the former. These findings shall prove useful for designing architectures with favorable empirical behavior.

Related papers

Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning [21.440470901377182]
Initializing with pre-trained models is becoming standard practice in machine learning. We study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with FedAvg.
arXiv Detail & Related papers (2025-02-11T23:53:16Z)
Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation [39.7344214193566]
We introduce a pioneering test-time adaptation framework tailored for time series forecasting (TSF) TAFAS, the proposed approach to TSF-TTA, flexibly adapts source forecasters to continuously shifting test distributions while preserving the core semantic information learned during pre-training. The novel utilization of partially-observed ground truth and gated calibration module enables proactive, robust, and model-agnostic adaptation of source forecasters.
arXiv Detail & Related papers (2025-01-09T04:59:15Z)
Test-Time Alignment via Hypothesis Reweighting [56.71167047381817]
Large pretrained models often struggle with underspecified tasks. We propose a novel framework to address the challenge of aligning models to test-time user intent.
arXiv Detail & Related papers (2024-12-11T23:02:26Z)
Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others. We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data. Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z)
Examining the Effect of Pre-training on Time Series Classification [21.38211396933795]
This study investigates the impact of pre-training followed by fine-tuning on the fine-tuning process. We conducted a thorough examination of 150 classification datasets. We find that pre-training can only help improve the optimization process for models that fit the data poorly. Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume.
arXiv Detail & Related papers (2023-09-11T06:26:57Z)
An unfolding method based on conditional Invertible Neural Networks (cINN) using iterative training [0.0]
Generative networks like invertible neural networks(INN) enable a probabilistic unfolding. We introduce the iterative conditional INN(IcINN) for unfolding that adjusts for deviations between simulated training samples and data.
arXiv Detail & Related papers (2022-12-16T19:00:05Z)
Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone. To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data. Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions. We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z)
Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation. We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs) PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z)
Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity. We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.