Divergences induced by dual subtractive and divisive normalizations of
exponential families and their convex deformations
- URL: http://arxiv.org/abs/2312.12849v2
- Date: Thu, 18 Jan 2024 00:39:29 GMT
- Title: Divergences induced by dual subtractive and divisive normalizations of
exponential families and their convex deformations
- Authors: Frank Nielsen
- Abstract summary: We show that skewed Bhattacharryya distances between probability densities of an exponential family amounts to skewed Jensen divergences induced by the cumulant function.
We then show how comparative convexity with respect to a pair of quasi-arithmetic means allows to deform both convex functions and their arguments.
- Score: 7.070726553564701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exponential families are statistical models which are the workhorses in
statistics, information theory, and machine learning among others. An
exponential family can either be normalized subtractively by its cumulant or
free energy function or equivalently normalized divisively by its partition
function. Both subtractive and divisive normalizers are strictly convex and
smooth functions inducing pairs of Bregman and Jensen divergences. It is
well-known that skewed Bhattacharryya distances between probability densities
of an exponential family amounts to skewed Jensen divergences induced by the
cumulant function between their corresponding natural parameters, and in limit
cases that the sided Kullback-Leibler divergences amount to reverse-sided
Bregman divergences. In this paper, we first show that the $\alpha$-divergences
between unnormalized densities of an exponential family amounts to scaled
$\alpha$-skewed Jensen divergences induced by the partition function. We then
show how comparative convexity with respect to a pair of quasi-arithmetic means
allows to deform both convex functions and their arguments, and thereby define
dually flat spaces with corresponding divergences when ordinary convexity is
preserved.
Related papers
- Unbiased Estimating Equation on Inverse Divergence and Its Conditions [0.10742675209112622]
This paper focuses on the Bregman divergence defined by the reciprocal function, called the inverse divergence.
For the loss function defined by the monotonically increasing function $f$ and inverse divergence, the conditions for the statistical model and function $f$ under which the estimating equation is unbiased are clarified.
arXiv Detail & Related papers (2024-04-25T11:22:48Z) - Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse
Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization.
We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors.
We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Causal Modeling with Stationary Diffusions [89.94899196106223]
We learn differential equations whose stationary densities model a system's behavior under interventions.
We show that they generalize to unseen interventions on their variables, often better than classical approaches.
Our inference method is based on a new theoretical result that expresses a stationarity condition on the diffusion's generator in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2023-10-26T14:01:17Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - On the Kullback-Leibler divergence between pairwise isotropic
Gaussian-Markov random fields [93.35534658875731]
We derive expressions for the Kullback-Leibler divergence between two pairwise isotropic Gaussian-Markov random fields.
The proposed equation allows the development of novel similarity measures in image processing and machine learning applications.
arXiv Detail & Related papers (2022-03-24T16:37:24Z) - An Indirect Rate-Distortion Characterization for Semantic Sources:
General Model and the Case of Gaussian Observation [83.93224401261068]
Source model is motivated by the recent surge of interest in the semantic aspect of information.
intrinsic state corresponds to the semantic feature of the source, which in general is not observable.
Rate-distortion function is the semantic rate-distortion function of the source.
arXiv Detail & Related papers (2022-01-29T02:14:24Z) - Decoherence factor as a convolution: an interplay between a Gaussian and
an exponential coherence loss [3.800391908440439]
We show that the decoherence factor can be described by the convolution of Gaussian and exponential functions.
The mechanism is demonstrated with two paradigmatic examples of decoherence -- a spin-bath model and the quantum Brownian motion.
arXiv Detail & Related papers (2021-10-18T16:55:16Z) - Equivalence of Convergence Rates of Posterior Distributions and Bayes
Estimators for Functions and Nonparametric Functionals [4.375582647111708]
We study the posterior contraction rates of a Bayesian method with Gaussian process priors in nonparametric regression.
For a general class of kernels, we establish convergence rates of the posterior measure of the regression function and its derivatives.
Our proof shows that, under certain conditions, to any convergence rate of Bayes estimators there corresponds the same convergence rate of the posterior distributions.
arXiv Detail & Related papers (2020-11-27T19:11:56Z) - Optimal Bounds between $f$-Divergences and Integral Probability Metrics [8.401473551081748]
Families of $f$-divergences and Integral Probability Metrics are widely used to quantify similarity between probability distributions.
We systematically study the relationship between these two families from the perspective of convex duality.
We obtain new bounds while also recovering in a unified manner well-known results, such as Hoeffding's lemma.
arXiv Detail & Related papers (2020-06-10T17:39:11Z) - Cumulant-free closed-form formulas for some common (dis)similarities
between densities of an exponential family [38.13659821903422]
In this work, we report (dis)similarity formulas which bypass the explicit use of the cumulant function.
Our method requires only to partially factorize the densities canonically of the considered exponential family.
arXiv Detail & Related papers (2020-03-05T07:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.