Probabilistic Pretraining for Neural Regression
- URL: http://arxiv.org/abs/2508.16355v1
- Date: Fri, 22 Aug 2025 13:03:04 GMT
- Title: Probabilistic Pretraining for Neural Regression
- Authors: Boris N. Oreshkin, Shiv Tavker, Dmitry Efimov,
- Abstract summary: We introduce NIAQUE, Neural Interpretable Any-Quantile Estimation, a new model for transfer learning in probabilistic regression.<n>We demonstrate that pre-training NIAQUE directly on diverse downstream regression datasets enhances performance on individual regression tasks.<n>We also highlight the effectiveness of NIAQUE in Kaggle competitions against strong baselines involving tree-based models and recent neural foundation models TabPFN and TabDPT.
- Score: 3.1224081969539714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer learning for probabilistic regression remains underexplored. This work closes this gap by introducing NIAQUE, Neural Interpretable Any-Quantile Estimation, a new model designed for transfer learning in probabilistic regression through permutation invariance. We demonstrate that pre-training NIAQUE directly on diverse downstream regression datasets and fine-tuning it on a specific target dataset enhances performance on individual regression tasks, showcasing the positive impact of probabilistic transfer learning. Furthermore, we highlight the effectiveness of NIAQUE in Kaggle competitions against strong baselines involving tree-based models and recent neural foundation models TabPFN and TabDPT. The findings highlight NIAQUE's efficacy as a robust and scalable framework for probabilistic regression, leveraging transfer learning to enhance predictive performance.
Related papers
- Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering [50.63386303357225]
We propose AdaRAS, a lightweight test-time framework that improves reasoning reliability by selectively intervening on neuron activations.<n>AdaRAS identifies Reasoning-Critical Neurons (RCNs) via a polarity-aware mean-difference criterion and adaptively steers their activations during inference.<n> Experiments on 10 mathematics and coding benchmarks demonstrate consistent improvements, including over 13% gains on AIME-24 and AIME-25.
arXiv Detail & Related papers (2026-01-27T17:53:01Z) - AdaPRL: Adaptive Pairwise Regression Learning with Uncertainty Estimation for Universal Regression Tasks [0.0]
We propose a novel adaptive pairwise learning framework for regression tasks (AdaPRL)<n>AdaPRL leverages the relative differences between data points and with deep probabilistic models to quantify the uncertainty associated with predictions.<n> Experiments show that AdaPRL can be seamlessly integrated into recently proposed regression frameworks to gain performance improvement.
arXiv Detail & Related papers (2025-01-10T09:19:10Z) - Quantifying the Prediction Uncertainty of Machine Learning Models for Individual Data [2.1248439796866228]
This study investigates pNML's learnability for linear regression and neural networks.<n>It demonstrates that pNML can improve the performance and robustness of these models on various tasks.
arXiv Detail & Related papers (2024-12-10T13:58:19Z) - Amortised Inference in Bayesian Neural Networks [0.0]
We introduce the Amortised Pseudo-Observation Variational Inference Bayesian Neural Network (APOVI-BNN)
We show that the amortised inference is of similar or better quality to those obtained through traditional variational inference.
We then discuss how the APOVI-BNN may be viewed as a new member of the neural process family.
arXiv Detail & Related papers (2023-09-06T14:02:33Z) - Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps.
We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights.
In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Towards Out-of-Distribution Sequential Event Prediction: A Causal
Treatment [72.50906475214457]
The goal of sequential event prediction is to estimate the next event based on a sequence of historical events.
In practice, the next-event prediction models are trained with sequential data collected at one time.
We propose a framework with hierarchical branching structures for learning context-specific representations.
arXiv Detail & Related papers (2022-10-24T07:54:13Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Measuring and Reducing Model Update Regression in Structured Prediction
for NLP [31.86240946966003]
backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor.
This work studies model update regression in structured prediction tasks.
We propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured output.
arXiv Detail & Related papers (2022-02-07T07:04:54Z) - Improving Music Performance Assessment with Contrastive Learning [78.8942067357231]
This study investigates contrastive learning as a potential method to improve existing MPA systems.
We introduce a weighted contrastive loss suitable for regression tasks applied to a convolutional neural network.
Our results show that contrastive-based methods are able to match and exceed SoTA performance for MPA regression tasks.
arXiv Detail & Related papers (2021-08-03T19:24:25Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - A Locally Adaptive Interpretable Regression [7.4267694612331905]
Linear regression is one of the most interpretable prediction models.
In this work, we introduce a locally adaptive interpretable regression (LoAIR)
Our model achieves comparable or better predictive performance than the other state-of-the-art baselines.
arXiv Detail & Related papers (2020-05-07T09:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.