Subset-of-Data Variational Inference for Deep Gaussian-Processes
Regression
- URL: http://arxiv.org/abs/2107.08265v1
- Date: Sat, 17 Jul 2021 15:55:35 GMT
- Title: Subset-of-Data Variational Inference for Deep Gaussian-Processes
Regression
- Authors: Ayush Jain (1), P. K. Srijith (1) and Mohammad Emtiyaz Khan (2) ((1)
Department of Computer Science and Engineering, Indian Institute of
Technology Hyderabad, India, (2) RIKEN Center for AI Project, Tokyo, Japan)
- Abstract summary: Deep Gaussian Processes (DGPs) are multi-layer, flexible extensions of Gaussian processes.
Sparse approximations simplify the training but often require optimization over a large number of inducing inputs and their locations.
In this paper, we simplify the training by setting the locations to a fixed subset of data and sampling the inducing inputs from a variational distribution.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Gaussian Processes (DGPs) are multi-layer, flexible extensions of
Gaussian processes but their training remains challenging. Sparse
approximations simplify the training but often require optimization over a
large number of inducing inputs and their locations across layers. In this
paper, we simplify the training by setting the locations to a fixed subset of
data and sampling the inducing inputs from a variational distribution. This
reduces the trainable parameters and computation cost without significant
performance degradations, as demonstrated by our empirical results on
regression problems. Our modifications simplify and stabilize DGP training
while making it amenable to sampling schemes for setting the inducing inputs.
Related papers
- Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling [22.256068524699472]
In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues.
We combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution.
Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
arXiv Detail & Related papers (2024-08-13T08:09:05Z) - Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution.
We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Variational Sparse Coding with Learned Thresholding [6.737133300781134]
We propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples.
We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation.
arXiv Detail & Related papers (2022-05-07T14:49:50Z) - Active Learning for Deep Gaussian Process Surrogates [0.3222802562733786]
Deep Gaussian processes (DGPs) are increasingly popular as predictive models in machine learning (ML)
Here we explore DGPs as surrogates for computer simulation experiments whose response surfaces exhibit similar characteristics.
We build up the design sequentially, limiting both expensive evaluation of the simulator code and mitigating cubic costs of DGP inference.
arXiv Detail & Related papers (2020-12-15T00:09:37Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Robust Sampling in Deep Learning [62.997667081978825]
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization.
We address this problem by a new regularization method based on distributional robust optimization.
During the training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization.
arXiv Detail & Related papers (2020-06-04T09:46:52Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z) - Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad
Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm.
When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic'
We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.