Flow-Based Likelihoods for Non-Gaussian Inference
- URL: http://arxiv.org/abs/2007.05535v2
- Date: Fri, 6 Nov 2020 17:33:17 GMT
- Title: Flow-Based Likelihoods for Non-Gaussian Inference
- Authors: Ana Diaz Rivero and Cora Dvorkin
- Abstract summary: We investigate the use of data-driven likelihoods to bypass a key assumption made in many scientific analyses.
We show that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size.
By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional data-driven likelihoods can be tied back to the structure of the NG in the data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the use of data-driven likelihoods to bypass a key assumption
made in many scientific analyses, which is that the true likelihood of the data
is Gaussian. In particular, we suggest using the optimization targets of
flow-based generative models, a class of models that can capture complex
distributions by transforming a simple base distribution through layers of
nonlinearities. We call these flow-based likelihoods (FBL). We analyze the
accuracy and precision of the reconstructed likelihoods on mock Gaussian data,
and show that simply gauging the quality of samples drawn from the trained
model is not a sufficient indicator that the true likelihood has been learned.
We nevertheless demonstrate that the likelihood can be reconstructed to a
precision equal to that of sampling error due to a finite sample size. We then
apply FBLs to mock weak lensing convergence power spectra, a cosmological
observable that is significantly non-Gaussian (NG). We find that the FBL
captures the NG signatures in the data extremely well, while other commonly
used data-driven likelihoods, such as Gaussian mixture models and independent
component analysis, fail to do so. This suggests that works that have found
small posterior shifts in NG data with data-driven likelihoods such as these
could be underestimating the impact of non-Gaussianity in parameter
constraints. By introducing a suite of tests that can capture different levels
of NG in the data, we show that the success or failure of traditional
data-driven likelihoods can be tied back to the structure of the NG in the
data. Unlike other methods, the flexibility of the FBL makes it successful at
tackling different types of NG simultaneously. Because of this, and
consequently their likely applicability across datasets and domains, we
encourage their use for inference when sufficient mock data are available for
training.
Related papers
- A Priori Uncertainty Quantification of Reacting Turbulence Closure Models using Bayesian Neural Networks [0.0]
We employ Bayesian neural networks to capture uncertainties in a reacting flow model.
We demonstrate that BNN models can provide unique insights about the structure of uncertainty of the data-driven closure models.
The efficacy of the model is demonstrated by a priori evaluation on a dataset consisting of a variety of flame conditions and fuels.
arXiv Detail & Related papers (2024-02-28T22:19:55Z) - Learning Multivariate CDFs and Copulas using Tensor Factorization [39.24470798045442]
Learning the multivariate distribution of data is a core challenge in statistics and machine learning.
In this work, we aim to learn multivariate cumulative distribution functions (CDFs), as they can handle mixed random variables.
We show that any grid sampled version of a joint CDF of mixed random variables admits a universal representation as a naive Bayes model.
We demonstrate the superior performance of the proposed model in several synthetic and real datasets and applications including regression, sampling and data imputation.
arXiv Detail & Related papers (2022-10-13T16:18:46Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - ManiFlow: Implicitly Representing Manifolds with Normalizing Flows [145.9820993054072]
Normalizing Flows (NFs) are flexible explicit generative models that have been shown to accurately model complex real-world data distributions.
We propose an optimization objective that recovers the most likely point on the manifold given a sample from the perturbed distribution.
Finally, we focus on 3D point clouds for which we utilize the explicit nature of NFs, i.e. surface normals extracted from the gradient of the log-likelihood and the log-likelihood itself.
arXiv Detail & Related papers (2022-08-18T16:07:59Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Copula Flows for Synthetic Data Generation [0.5801044612920815]
We propose to use a probabilistic model as a synthetic data generator.
We benchmark our method on both simulated and real data-sets in terms of density estimation.
arXiv Detail & Related papers (2021-01-03T10:06:23Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Data-driven learning of robust nonlocal physics from high-fidelity
synthetic data [3.9181541460605116]
Key challenge to nonlocal models is the analytical complexity of deriving them from first principles, and frequently their use is justified a posteriori.
In this work we extract nonlocal models from data, circumventing these challenges and providing data-driven justification for the resulting model form.
arXiv Detail & Related papers (2020-05-17T22:53:14Z) - Semi-Supervised Learning with Normalizing Flows [54.376602201489995]
FlowGMM is an end-to-end approach to generative semi supervised learning with normalizing flows.
We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data.
arXiv Detail & Related papers (2019-12-30T17:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.