Self-Distillation for Gaussian Process Regression and Classification
- URL: http://arxiv.org/abs/2304.02641v1
- Date: Wed, 5 Apr 2023 17:59:20 GMT
- Title: Self-Distillation for Gaussian Process Regression and Classification
- Authors: Kenneth Borup and Lars N{\o}rvang Andersen
- Abstract summary: We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC)
The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher.
The distribution-centric approach re-uses the full probabilistic posterior for the next iteration.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose two approaches to extend the notion of knowledge distillation to
Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC);
data-centric and distribution-centric. The data-centric approach resembles most
current distillation techniques for machine learning, and refits a model on
deterministic predictions from the teacher, while the distribution-centric
approach, re-uses the full probabilistic posterior for the next iteration. By
analyzing the properties of these approaches, we show that the data-centric
approach for GPR closely relates to known results for self-distillation of
kernel ridge regression and that the distribution-centric approach for GPR
corresponds to ordinary GPR with a very particular choice of hyperparameters.
Furthermore, we demonstrate that the distribution-centric approach for GPC
approximately corresponds to data duplication and a particular scaling of the
covariance and that the data-centric approach for GPC requires redefining the
model from a Binomial likelihood to a continuous Bernoulli likelihood to be
well-specified. To the best of our knowledge, our proposed approaches are the
first to formulate knowledge distillation specifically for Gaussian Process
models.
Related papers
- A Generalized Unified Skew-Normal Process with Neural Bayes Inference [1.5388334141379898]
In recent decades, statisticians have been encountering spatial data that exhibit non-Gaussian behaviors such as asymmetry and heavy-tailedness.
To address the limitations of the Gaussian models, a variety of skewed models has been proposed, of which the popularity has grown rapidly.
Among various proposals in the literature, unified skewed distributions, such as the Unified Skew-Normal (SUN), have received considerable attention.
arXiv Detail & Related papers (2024-11-26T13:00:39Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Explainable Learning with Gaussian Processes [23.796560256071473]
We take a principled approach to defining attributions under model uncertainty, extending the existing literature.
We show that although GPR is a highly flexible and non-parametric approach, we can derive interpretable, closed-form expressions for the feature attributions.
We also show that, when applicable, the exact expressions for GPR attributions are both more accurate and less computationally expensive than the approximations currently used in practice.
arXiv Detail & Related papers (2024-03-11T18:03:02Z) - Sparse Variational Contaminated Noise Gaussian Process Regression with Applications in Geomagnetic Perturbations Forecasting [4.675221539472143]
We propose a scalable inference algorithm for fitting sparse Gaussian process regression models with contaminated normal noise on large datasets.
We show that our approach yields shorter prediction intervals for similar coverage and accuracy when compared to an artificial dense neural network baseline.
arXiv Detail & Related papers (2024-02-27T15:08:57Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Heterogeneous Multi-Task Gaussian Cox Processes [61.67344039414193]
We present a novel extension of multi-task Gaussian Cox processes for modeling heterogeneous correlated tasks jointly.
A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks.
We derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters.
arXiv Detail & Related papers (2023-08-29T15:01:01Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Gaussian Graphical Models as an Ensemble Method for Distributed Gaussian
Processes [8.4159776055506]
We propose a novel approach for aggregating the Gaussian experts' predictions by Gaussian graphical model (GGM)
We first estimate the joint distribution of latent and observed variables using the Expectation-Maximization (EM) algorithm.
Our new method outperforms other state-of-the-art DGP approaches.
arXiv Detail & Related papers (2022-02-07T15:22:56Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Incorporating Causal Graphical Prior Knowledge into Predictive Modeling
via Simple Data Augmentation [92.96204497841032]
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions.
We propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the conditional independence (CI) relations.
We experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.
arXiv Detail & Related papers (2021-02-27T06:13:59Z) - Gaussian Process Regression with Local Explanation [28.90948136731314]
We propose GPR with local explanation, which reveals the feature contributions to the prediction of each sample.
In the proposed model, both the prediction and explanation for each sample are performed using an easy-to-interpret locally linear model.
For a new test sample, the proposed model can predict the values of its target variable and weight vector, as well as their uncertainties.
arXiv Detail & Related papers (2020-07-03T13:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.