Related papers: Sample-Efficient Optimisation with Probabilistic Transformer Surrogates

Sample-Efficient Optimisation with Probabilistic Transformer Surrogates

URL: http://arxiv.org/abs/2205.13902v2
Date: Mon, 30 May 2022 08:55:32 GMT
Title: Sample-Efficient Optimisation with Probabilistic Transformer Surrogates
Authors: Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, Rasul Tutunov, Jun Wang, Haitham Bou Ammar
Abstract summary: This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation. We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
Score: 66.98962321504085
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Faced with problems of increasing complexity, recent research in Bayesian Optimisation (BO) has focused on adapting deep probabilistic models as flexible alternatives to Gaussian Processes (GPs). In a similar vein, this paper investigates the feasibility of employing state-of-the-art probabilistic transformers in BO. Upon further investigation, we observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. First, we notice that these models are trained on uniformly distributed inputs, which impairs predictive accuracy on non-uniform data - a setting arising from any typical BO loop due to exploration-exploitation trade-offs. Second, we realise that training losses (e.g., cross-entropy) only asymptotically guarantee accurate posterior approximations, i.e., after arriving at the global optimum, which generally cannot be ensured. At the stationary points of the loss function, however, we observe a degradation in predictive performance especially in exploratory regions of the input space. To tackle these shortcomings we introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance. In a large panel of experiments, we demonstrate, for the first time, that one transformer pre-trained on data sampled from random GP priors produces competitive results on 16 benchmark black-boxes compared to GP-based BO. Since our model is only pre-trained once and used in all tasks without any retraining and/or fine-tuning, we report an order of magnitude time-reduction, while matching and sometimes outperforming GPs.

Related papers

BO-SA-PINNs: Self-adaptive physics-informed neural networks based on Bayesian optimization for automatically designing PDE solvers [13.048817629665649]
Physics-informed neural networks (PINNs) are a popular alternative method for solving partial differential equations (PDEs) PINNs require dedicated manual modifications to the hyperparameters of the network, the sampling methods and loss function weights for different PDEs, which reduces the efficiency of the solvers. We propose a general multi-stage framework, i.e. BO-SA-PINNs, to alleviate this issue.
arXiv Detail & Related papers (2025-04-14T02:07:45Z)
Fixed-Mean Gaussian Processes for Post-hoc Bayesian Deep Learning [11.22428369342346]
We introduce a novel family of sparse variational Gaussian processes (GPs), where the posterior mean is fixed to any continuous function when using a universal kernel. Specifically, we fix the mean of this GP to the output of the pre-trained DNN, allowing our approach to effectively fit the GP's predictive variances to estimate the prediction uncertainty. Experimental results demonstrate that FMGP improves both uncertainty estimation and computational efficiency when compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-12-05T14:17:16Z)
Robust Bayesian Optimization via Localized Online Conformal Prediction [37.549297668783254]
We introduce localized online conformal prediction-based Bayesian optimization (LOCBO) LOCBO calibrates the GP model through localized online conformal prediction (CP) We provide theoretical performance guarantees for LOCBO's iterates that hold for the unobserved objective function.
arXiv Detail & Related papers (2024-11-26T12:45:54Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
Statistical Foundations of Prior-Data Fitted Networks [0.7614628596146599]
Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning. This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior.
arXiv Detail & Related papers (2023-05-18T16:34:21Z)
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization [50.41984119504716]
We present a new paradigm for fine-tuning large-scale vision pre-trained models on downstream task, dubbed Prompt Regularization (ProReg) ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. We show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T11:53:55Z)
Test-time Batch Normalization [61.292862024903584]
Deep neural networks often suffer the data distribution shift between training and testing. We revisit the batch normalization (BN) in the training process and reveal two key insights benefiting test-time optimization. We propose a novel test-time BN layer design, GpreBN, which is optimized during testing by minimizing Entropy loss.
arXiv Detail & Related papers (2022-05-20T14:33:39Z)
Local Gaussian process extrapolation for BART models with applications to causal inference [0.7734726150561088]
This paper proposes a novel extrapolation strategy that grafts Gaussian processes to the leaf nodes in BART for predicting points outside the range of the observed data. In simulations studies, the new approach boasts superior performance compared to popular alternatives, such as Jackknife+.
arXiv Detail & Related papers (2022-04-23T00:37:53Z)
Reducing the Amortization Gap in Variational Autoencoders: A Bayesian Random Function Approach [38.45568741734893]
Inference in our GP model is done by a single feed forward pass through the network, significantly faster than semi-amortized methods. We show that our approach attains higher test data likelihood than the state-of-the-arts on several benchmark datasets.
arXiv Detail & Related papers (2021-02-05T13:01:12Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.