Multi-class Gaussian Process Classification with Noisy Inputs
- URL: http://arxiv.org/abs/2001.10523v3
- Date: Wed, 30 Dec 2020 13:41:55 GMT
- Title: Multi-class Gaussian Process Classification with Noisy Inputs
- Authors: Carlos Villacampa-Calvo, Bryan Zaldivar, Eduardo C. Garrido-Merch\'an,
Daniel Hern\'andez-Lobato
- Abstract summary: In some situations, the amount of noise can be known before-hand.
We have evaluated the proposed methods by carrying out several experiments, involving synthetic and real data.
- Score: 2.362412515574206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is a common practice in the machine learning community to assume that the
observed data are noise-free in the input attributes. Nevertheless, scenarios
with input noise are common in real problems, as measurements are never
perfectly accurate. If this input noise is not taken into account, a supervised
machine learning method is expected to perform sub-optimally. In this paper, we
focus on multi-class classification problems and use Gaussian processes (GPs)
as the underlying classifier. Motivated by a data set coming from the
astrophysics domain, we hypothesize that the observed data may contain noise in
the inputs. Therefore, we devise several multi-class GP classifiers that can
account for input noise. Such classifiers can be efficiently trained using
variational inference to approximate the posterior distribution of the latent
variables of the model. Moreover, in some situations, the amount of noise can
be known before-hand. If this is the case, it can be readily introduced in the
proposed methods. This prior information is expected to lead to better
performance results. We have evaluated the proposed methods by carrying out
several experiments, involving synthetic and real data. These include several
data sets from the UCI repository, the MNIST data set and a data set coming
from astrophysics. The results obtained show that, although the classification
error is similar across methods, the predictive distribution of the proposed
methods is better, in terms of the test log-likelihood, than the predictive
distribution of a classifier based on GPs that ignores input noise.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Understanding Noise-Augmented Training for Randomized Smoothing [14.061680807550722]
Randomized smoothing is a technique for providing provable robustness guarantees against adversarial attacks.
We show that, without making stronger distributional assumptions, no benefit can be expected from predictors trained with noise-augmentation.
Our analysis has direct implications to the practical deployment of randomized smoothing.
arXiv Detail & Related papers (2023-05-08T14:46:34Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Uncertainty quantification for multiclass data description [0.1611401281366893]
We propose a multiclass data description model based on kernel Mahalanobis distance (MDD-KM)
We report a prototypical classification system based on a hierarchical linear dynamical system with MDD-KM as a component.
arXiv Detail & Related papers (2021-08-29T14:42:04Z) - The information of attribute uncertainties: what convolutional neural
networks can learn about errors in input data [0.0]
We show how Convolutional Neural Networks (CNNs) are able to learn about the context and patterns of signal and noise.
We show that, when each data point is subject to different levels of noise, that information can be learned by the CNNs.
arXiv Detail & Related papers (2021-08-10T15:10:46Z) - Estimating g-Leakage via Machine Learning [34.102705643128004]
This paper considers the problem of estimating the information leakage of a system in the black-box scenario.
It is assumed that the system's internals are unknown to the learner, or anyway too complicated to analyze.
We propose a novel approach to perform black-box estimation of the g-vulnerability using Machine Learning (ML) algorithms.
arXiv Detail & Related papers (2020-05-09T09:26:36Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.