$\beta$-Cores: Robust Large-Scale Bayesian Data Summarization in the
Presence of Outliers
- URL: http://arxiv.org/abs/2008.13600v2
- Date: Mon, 9 Nov 2020 10:25:11 GMT
- Title: $\beta$-Cores: Robust Large-Scale Bayesian Data Summarization in the
Presence of Outliers
- Authors: Dionysis Manousakas and Cecilia Mascolo
- Abstract summary: The quality of classic Bayesian inference depends critically on whether observations conform with the assumed data generating model.
We propose a variational inference method that, in a principled way, can simultaneously scale to large datasets.
We illustrate the applicability of our approach in diverse simulated and real datasets, and various statistical models.
- Score: 14.918826474979587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern machine learning applications should be able to address the intrinsic
challenges arising over inference on massive real-world datasets, including
scalability and robustness to outliers. Despite the multiple benefits of
Bayesian methods (such as uncertainty-aware predictions, incorporation of
experts knowledge, and hierarchical modeling), the quality of classic Bayesian
inference depends critically on whether observations conform with the assumed
data generating model, which is impossible to guarantee in practice. In this
work, we propose a variational inference method that, in a principled way, can
simultaneously scale to large datasets, and robustify the inferred posterior
with respect to the existence of outliers in the observed data. Reformulating
Bayes theorem via the $\beta$-divergence, we posit a robustified
pseudo-Bayesian posterior as the target of inference. Moreover, relying on the
recent formulations of Riemannian coresets for scalable Bayesian inference, we
propose a sparse variational approximation of the robustified posterior and an
efficient stochastic black-box algorithm to construct it. Overall our method
allows releasing cleansed data summaries that can be applied broadly in
scenarios including structured data corruption. We illustrate the applicability
of our approach in diverse simulated and real datasets, and various statistical
models, including Gaussian mean inference, logistic and neural linear
regression, demonstrating its superiority to existing Bayesian summarization
methods in the presence of outliers.
Related papers
- Inflationary Flows: Calibrated Bayesian Inference with Diffusion-Based Models [0.0]
We show how diffusion-based models can be repurposed for performing principled, identifiable Bayesian inference.
We show how such maps can be learned via standard DBM training using a novel noise schedule.
The result is a class of highly expressive generative models, uniquely defined on a low-dimensional latent space.
arXiv Detail & Related papers (2024-07-11T19:58:19Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Adversarial robustness of amortized Bayesian inference [3.308743964406687]
Amortized Bayesian inference is to initially invest computational cost in training an inference network on simulated data.
We show that almost unrecognizable, targeted perturbations of the observations can lead to drastic changes in the predicted posterior and highly unrealistic posterior predictive samples.
We propose a computationally efficient regularization scheme based on penalizing the Fisher information of the conditional density estimator.
arXiv Detail & Related papers (2023-05-24T10:18:45Z) - Bayesian Imaging With Data-Driven Priors Encoded by Neural Networks:
Theory, Methods, and Algorithms [2.266704469122763]
This paper proposes a new methodology for performing Bayesian inference in imaging inverse problems where the prior knowledge is available in the form of training data.
We establish the existence and well-posedness of the associated posterior moments under easily verifiable conditions.
A model accuracy analysis suggests that the Bayesian probability probabilities reported by the data-driven models are also remarkably accurate under a frequentist definition.
arXiv Detail & Related papers (2021-03-18T11:34:08Z) - Robust Bayesian Inference for Discrete Outcomes with the Total Variation
Distance [5.139874302398955]
Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination.
Here, we introduce a robust discrepancy-based Bayesian approach using the Total Variation Distance (TVD)
We empirically demonstrate that our approach is robust and significantly improves predictive performance on a range of simulated and real world data.
arXiv Detail & Related papers (2020-10-26T09:53:06Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z) - Scaling Bayesian inference of mixed multinomial logit models to very
large datasets [9.442139459221785]
We propose an Amortized Variational Inference approach that leverages backpropagation, automatic differentiation and GPU-accelerated computation.
We show how normalizing flows can be used to increase the flexibility of the variational posterior approximations.
arXiv Detail & Related papers (2020-04-11T15:30:47Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.