Bias-Free Scalable Gaussian Processes via Randomized Truncations
- URL: http://arxiv.org/abs/2102.06695v1
- Date: Fri, 12 Feb 2021 18:54:10 GMT
- Title: Bias-Free Scalable Gaussian Processes via Randomized Truncations
- Authors: Andres Potapczynski, Luhuan Wu, Dan Biderman, Geoff Pleiss and John P.
Cunningham
- Abstract summary: This paper analyzes two common techniques: early truncated conjugate gradients (CG) and random Fourier features (RFF)
CG tends to underfit while RFF tends to overfit.
We address these issues using randomized truncation estimators that eliminate bias in exchange for increased variance.
- Score: 26.985324213848475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scalable Gaussian Process methods are computationally attractive, yet
introduce modeling biases that require rigorous study. This paper analyzes two
common techniques: early truncated conjugate gradients (CG) and random Fourier
features (RFF). We find that both methods introduce a systematic bias on the
learned hyperparameters: CG tends to underfit while RFF tends to overfit. We
address these issues using randomized truncation estimators that eliminate bias
in exchange for increased variance. In the case of RFF, we show that the
bias-to-variance conversion is indeed a trade-off: the additional variance
proves detrimental to optimization. However, in the case of CG, our unbiased
learning procedure meaningfully outperforms its biased counterpart with minimal
additional computation.
Related papers
- Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation [0.8192907805418583]
We show that biased gradients converge to critical points for smooth non- functions.
We show how the effect of bias can be reduced by appropriate tuning.
arXiv Detail & Related papers (2024-02-05T10:17:36Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Generalizing Gaussian Smoothing for Random Search [23.381986209234164]
Gaussian smoothing (GS) is a derivative-free optimization algorithm that estimates the gradient of an objective using perturbations of the current benchmarks.
We propose to choose a distribution for perturbations that minimizes the error of such distributions with provably smaller MSE.
arXiv Detail & Related papers (2022-11-27T04:42:05Z) - Bayesian Active Learning with Fully Bayesian Gaussian Processes [0.0]
In active learning, where labeled data is scarce or difficult to obtain, neglecting this trade-off can cause inefficient querying.
We show that incorporating the bias-variance trade-off in the acquisition functions mitigates unnecessary and expensive data labeling.
arXiv Detail & Related papers (2022-05-20T13:52:04Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Ridge Regression with Frequent Directions: Statistical and Optimization
Perspectives [1.0152838128195465]
We show that acrshortfd can be used in the optimization setting through an iterative scheme which yields high-accuracy solutions.
This improves on randomized approaches which need to compromise the need for a new sketch every iteration with speed of convergence.
arXiv Detail & Related papers (2020-11-06T21:40:38Z) - On the Convergence of SGD with Biased Gradients [28.400751656818215]
We analyze the guiding domain of biased gradient methods (SGD), where individual updates are corrupted by compression.
We quantify how many magnitudes of bias accuracy and convergence rates are impacted.
arXiv Detail & Related papers (2020-07-31T19:37:59Z) - Distributed Averaging Methods for Randomized Second Order Optimization [54.51566432934556]
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a bottleneck.
We develop unbiased parameter averaging methods for randomized second order optimization that employ sampling and sketching of the Hessian.
We also extend the framework of second order averaging methods to introduce an unbiased distributed optimization framework for heterogeneous computing systems.
arXiv Detail & Related papers (2020-02-16T09:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.