Interaction Models and Generalized Score Matching for Compositional Data
- URL: http://arxiv.org/abs/2109.04671v1
- Date: Fri, 10 Sep 2021 05:29:41 GMT
- Title: Interaction Models and Generalized Score Matching for Compositional Data
- Authors: Shiqing Yu, Mathias Drton, Ali Shojaie
- Abstract summary: We propose a class of exponential family models that accommodate general patterns of pairwise interaction while being supported on the probability simplex.
Special cases include the family of Dirichlet distributions as well as Aitchison's additive logistic normal distributions.
A high-dimensional analysis of our estimation methods shows that the simplex domain is handled as efficiently as previously studied full-dimensional domains.
- Score: 9.797319790710713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Applications such as the analysis of microbiome data have led to renewed
interest in statistical methods for compositional data, i.e., multivariate data
in the form of probability vectors that contain relative proportions. In
particular, there is considerable interest in modeling interactions among such
relative proportions. To this end we propose a class of exponential family
models that accommodate general patterns of pairwise interaction while being
supported on the probability simplex. Special cases include the family of
Dirichlet distributions as well as Aitchison's additive logistic normal
distributions. Generally, the distributions we consider have a density that
features a difficult to compute normalizing constant. To circumvent this issue,
we design effective estimation methods based on generalized versions of score
matching. A high-dimensional analysis of our estimation methods shows that the
simplex domain is handled as efficiently as previously studied full-dimensional
domains.
Related papers
- Generative Assignment Flows for Representing and Learning Joint Distributions of Discrete Data [2.6499018693213316]
We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables.
The embedding of the flow via the Segre map in the meta-simplex of all discrete joint distributions ensures that any target distribution can be represented in principle.
Our approach has strong motivation from first principles of modeling coupled discrete variables.
arXiv Detail & Related papers (2024-06-06T21:58:33Z) - Empirical Density Estimation based on Spline Quasi-Interpolation with
applications to Copulas clustering modeling [0.0]
Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data.
In this paper we propose the mono-variate approximation of the density using quasi-interpolation.
The presented algorithm is validated on artificial and real datasets.
arXiv Detail & Related papers (2024-02-18T11:49:38Z) - Beyond Normal: On the Evaluation of Mutual Information Estimators [52.85079110699378]
We show how to construct a diverse family of distributions with known ground-truth mutual information.
We provide guidelines for practitioners on how to select appropriate estimator adapted to the difficulty of problem considered.
arXiv Detail & Related papers (2023-06-19T17:26:34Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - A method to integrate and classify normal distributions [0.0]
We present results and open-source software that provide the probability in any domain of a normal in any dimensions with any parameters.
We demonstrate these tools with vision research applications of detecting occluding objects in natural scenes, and detecting camouflage.
arXiv Detail & Related papers (2020-12-23T05:45:41Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.