Expectation propagation on the diluted Bayesian classifier
- URL: http://arxiv.org/abs/2009.09545v2
- Date: Sun, 31 Jan 2021 00:46:21 GMT
- Title: Expectation propagation on the diluted Bayesian classifier
- Authors: Alfredo Braunstein, Thomas Gueudr\'e, Andrea Pagnani and Mirko
Pieropan
- Abstract summary: We introduce a statistical mechanics inspired strategy that addresses the problem of sparse feature selection in the context of binary classification.
A computational scheme known as expectation propagation (EP) is used to train a continuous-weights perceptron learning a classification rule.
EP is a robust and competitive algorithm in terms of variable selection properties, estimation accuracy and computational complexity.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient feature selection from high-dimensional datasets is a very
important challenge in many data-driven fields of science and engineering. We
introduce a statistical mechanics inspired strategy that addresses the problem
of sparse feature selection in the context of binary classification by
leveraging a computational scheme known as expectation propagation (EP). The
algorithm is used in order to train a continuous-weights perceptron learning a
classification rule from a set of (possibly partly mislabeled) examples
provided by a teacher perceptron with diluted continuous weights. We test the
method in the Bayes optimal setting under a variety of conditions and compare
it to other state-of-the-art algorithms based on message passing and on
expectation maximization approximate inference schemes. Overall, our
simulations show that EP is a robust and competitive algorithm in terms of
variable selection properties, estimation accuracy and computational
complexity, especially when the student perceptron is trained from correlated
patterns that prevent other iterative methods from converging. Furthermore, our
numerical tests demonstrate that the algorithm is capable of learning online
the unknown values of prior parameters, such as the dilution level of the
weights of the teacher perceptron and the fraction of mislabeled examples,
quite accurately. This is achieved by means of a simple maximum likelihood
strategy that consists in minimizing the free energy associated with the EP
algorithm.
Related papers
- A sparse PAC-Bayesian approach for high-dimensional quantile prediction [0.0]
This paper presents a novel probabilistic machine learning approach for high-dimensional quantile prediction.
It uses a pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte Carlo for efficient computation.
Its effectiveness is validated through simulations and real-world data, where it performs competitively against established frequentist and Bayesian techniques.
arXiv Detail & Related papers (2024-09-03T08:01:01Z) - Learning to sample fibers for goodness-of-fit testing [0.0]
We consider the problem of constructing exact goodness-of-fit tests for discrete exponential family models.
We translate the problem into a Markov decision process and demonstrate a reinforcement learning approach for learning good moves' for sampling.
Our algorithm is based on an actor-critic sampling scheme, with provable convergence.
arXiv Detail & Related papers (2024-05-22T19:33:58Z) - Efficient and Near-Optimal Smoothed Online Learning for Generalized
Linear Functions [28.30744223973527]
We give a computationally efficient algorithm that is the first to enjoy the statistically optimal log(T/sigma) regret for realizable K-wise linear classification.
We develop a novel characterization of the geometry of the disagreement region induced by generalized linear classifiers.
arXiv Detail & Related papers (2022-05-25T21:31:36Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - On Last-Layer Algorithms for Classification: Decoupling Representation
from Uncertainty Estimation [27.077741143188867]
We propose a family of algorithms which split the classification task into two stages: representation learning and uncertainty estimation.
We evaluate their performance in terms of emphselective classification (risk-coverage), and their ability to detect out-of-distribution samples.
arXiv Detail & Related papers (2020-01-22T15:08:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.