Squared families: Searching beyond regular probability models
- URL: http://arxiv.org/abs/2503.21128v1
- Date: Thu, 27 Mar 2025 03:39:35 GMT
- Title: Squared families: Searching beyond regular probability models
- Authors: Russell Tsuchida, Jiawei Liu, Cheng Soon Ong, Dino Sejdinovic,
- Abstract summary: Squared families are families of probability densities obtained by squaring a linear transformation of a statistic.<n>Their Fisher information is a conformal transformation of the Hessian metric induced from a Bregman generator.<n>The squared family kernel is the only integral that needs to be computed for the Fisher information, statistical divergence and normalising constant.
- Score: 22.68738495315807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce squared families, which are families of probability densities obtained by squaring a linear transformation of a statistic. Squared families are singular, however their singularity can easily be handled so that they form regular models. After handling the singularity, squared families possess many convenient properties. Their Fisher information is a conformal transformation of the Hessian metric induced from a Bregman generator. The Bregman generator is the normalising constant, and yields a statistical divergence on the family. The normalising constant admits a helpful parameter-integral factorisation, meaning that only one parameter-independent integral needs to be computed for all normalising constants in the family, unlike in exponential families. Finally, the squared family kernel is the only integral that needs to be computed for the Fisher information, statistical divergence and normalising constant. We then describe how squared families are special in the broader class of $g$-families, which are obtained by applying a sufficiently regular function $g$ to a linear transformation of a statistic. After removing special singularities, positively homogeneous families and exponential families are the only $g$-families for which the Fisher information is a conformal transformation of the Hessian metric, where the generator depends on the parameter only through the normalising constant. Even-order monomial families also admit parameter-integral factorisations, unlike exponential families. We study parameter estimation and density estimation in squared families, in the well-specified and misspecified settings. We use a universal approximation property to show that squared families can learn sufficiently well-behaved target densities at a rate of $\mathcal{O}(N^{-1/2})+C n^{-1/4}$, where $N$ is the number of datapoints, $n$ is the number of parameters, and $C$ is some constant.
Related papers
- Estimating the normal-inverse-Wishart distribution [0.6216023343793144]
We describe a convergent procedure for converting from mean parameters to natural parameters in the NIW family.
This is needed when using a NIW base family in expectation propagation.
arXiv Detail & Related papers (2024-05-25T06:39:39Z) - Generalized Data Thinning Using Sufficient Statistics [2.3488056916440856]
A recent paper showed that for some well-known natural exponential families, $X$ can be "thinned" into independent random variables $X(1), ldots, X(K)$, such that $X = sum_k=1K X(k)$.
These independent random variables can then be used for various model validation and inference tasks, including in contexts where traditional sample splitting fails.
In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct $X$.
arXiv Detail & Related papers (2023-03-22T22:00:50Z) - General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean
Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it.
We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z) - Clustering above Exponential Families with Tempered Exponential Measures [28.532545355403123]
Link with exponential families has allowed $k$-means clustering to be generalized to a wide variety of data generating distributions.
Getting the framework to work above exponential families is important to lift roadblocks like the lack of robustness of some population minimizers carved in their axiomatization.
arXiv Detail & Related papers (2022-11-04T21:58:40Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Exponential Family Model-Based Reinforcement Learning via Score Matching [97.31477125728844]
We propose an optimistic model-based algorithm, dubbed SMRL, for finitehorizon episodic reinforcement learning (RL)
SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression.
arXiv Detail & Related papers (2021-12-28T15:51:07Z) - Kernel Deformed Exponential Families for Sparse Continuous Attention [76.61129971916702]
We show existence results for kernel exponential and deformed exponential families.
Experiments show that kernel deformed exponential families can attend to multiple compact regions of the data domain.
arXiv Detail & Related papers (2021-11-01T19:21:22Z) - Bayesian Quadrature on Riemannian Data Manifolds [79.71142807798284]
A principled way to model nonlinear geometric structure inherent in data is provided.
However, these operations are typically computationally demanding.
In particular, we focus on Bayesian quadrature (BQ) to numerically compute integrals over normal laws.
We show that by leveraging both prior knowledge and an active exploration scheme, BQ significantly reduces the number of required evaluations.
arXiv Detail & Related papers (2021-02-12T17:38:04Z) - Flexible mean field variational inference using mixtures of
non-overlapping exponential families [6.599344783327053]
I show that using standard mean field variational inference can fail to produce sensible results for models with sparsity-inducing priors.
I show that any mixture of a diffuse exponential family and a point mass at zero to model sparsity forms an exponential family.
arXiv Detail & Related papers (2020-10-14T01:46:56Z) - Debiasing Distributed Second Order Optimization with Surrogate Sketching
and Scaled Regularization [101.5159744660701]
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data.
Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods.
arXiv Detail & Related papers (2020-07-02T18:08:14Z) - Cumulant-free closed-form formulas for some common (dis)similarities
between densities of an exponential family [38.13659821903422]
In this work, we report (dis)similarity formulas which bypass the explicit use of the cumulant function.
Our method requires only to partially factorize the densities canonically of the considered exponential family.
arXiv Detail & Related papers (2020-03-05T07:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.