Review of Probability Distributions for Modeling Count Data
- URL: http://arxiv.org/abs/2001.04343v1
- Date: Fri, 10 Jan 2020 18:28:19 GMT
- Title: Review of Probability Distributions for Modeling Count Data
- Authors: F. William Townes
- Abstract summary: Generalized linear models enable direct modeling of counts in a regression context.
When counts contain only relative information, multinomial or Dirichlet-multinomial models can be more appropriate.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Count data take on non-negative integer values and are challenging to
properly analyze using standard linear-Gaussian methods such as linear
regression and principal components analysis. Generalized linear models enable
direct modeling of counts in a regression context using distributions such as
the Poisson and negative binomial. When counts contain only relative
information, multinomial or Dirichlet-multinomial models can be more
appropriate. We review some of the fundamental connections between multinomial
and count models from probability theory, providing detailed proofs. These
relationships are useful for methods development in applications such as topic
modeling of text data and genomics.
Related papers
- Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Conjugate priors for count and rounded data regression [0.0]
We introduce conjugate priors that enable closed-form posterior inference.
Key posterior and predictive functionals are computable analytically or via direct Monte Carlo simulation.
These tools are broadly useful for linear regression, nonlinear models via basis expansions, and model and variable selection.
arXiv Detail & Related papers (2021-10-23T23:26:01Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Nonparametric Functional Analysis of Generalized Linear Models Under
Nonlinear Constraints [0.0]
This article introduces a novel nonparametric methodology for Generalized Linear Models.
It combines the strengths of the binary regression and latent variable formulations for categorical data.
It extends recently published parametric versions of the methodology and generalizes it.
arXiv Detail & Related papers (2021-10-11T04:49:59Z) - PSD Representations for Effective Probability Models [117.35298398434628]
We show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end.
We characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees.
Our results open the way to applications of PSD models to density estimation, decision theory and inference.
arXiv Detail & Related papers (2021-06-30T15:13:39Z) - Bayesian Inference for Gamma Models [4.189643331553922]
We use the theory of normal variance-mean mixtures to derive a data augmentation scheme for models that include gamma functions.
We illustrate our methodology on a number of examples, including gamma shape inference, negative binomial regression and Dirichlet allocation.
arXiv Detail & Related papers (2021-06-03T14:58:39Z) - Asymptotically Exact and Fast Gaussian Copula Models for Imputation of
Mixed Data Types [0.13764085113103217]
Missing values with mixed data types is a common problem in a large number of machine learning applications.
Gustafson copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework.
We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures.
arXiv Detail & Related papers (2021-02-04T14:42:29Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.