Inductive Mutual Information Estimation: A Convex Maximum-Entropy Copula
Approach
- URL: http://arxiv.org/abs/2102.13182v1
- Date: Thu, 25 Feb 2021 21:21:40 GMT
- Title: Inductive Mutual Information Estimation: A Convex Maximum-Entropy Copula
Approach
- Authors: Yves-Laurent Kom Samo
- Abstract summary: We propose a novel estimator of the mutual information between two ordinal vectors $x$ and $y$.
We prove that, so long as the constraint is feasible, this problem admits a unique solution, it is in the exponential family, and it can be learned by solving a convex optimization problem.
We show that our approach may be used to mitigate mode collapse in GANs by maximizing the entropy of the copula of fake samples.
- Score: 0.5330240017302619
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose a novel estimator of the mutual information between two ordinal
vectors $x$ and $y$. Our approach is inductive (as opposed to deductive) in
that it depends on the data generating distribution solely through some
nonparametric properties revealing associations in the data, and does not
require having enough data to fully characterize the true joint distributions
$P_{x, y}$. Specifically, our approach consists of (i) noting that $I\left(y;
x\right) = I\left(u_y; u_x\right)$ where $u_y$ and $u_x$ are the
\emph{copula-uniform dual representations} of $y$ and $x$ (i.e. their images
under the probability integral transform), and (ii) estimating the copula
entropies $h\left(u_y\right)$, $h\left(u_x\right)$ and $h\left(u_y, u_x\right)$
by solving a maximum-entropy problem over the space of copula densities under a
constraint of the type $\alpha_m = E\left[\phi_m(u_y, u_x)\right]$. We prove
that, so long as the constraint is feasible, this problem admits a unique
solution, it is in the exponential family, and it can be learned by solving a
convex optimization problem. The resulting estimator, which we denote MIND, is
marginal-invariant, always non-negative, unbounded for any sample size $n$,
consistent, has MSE rate $O(1/n)$, and is more data-efficient than competing
approaches. Beyond mutual information estimation, we illustrate that our
approach may be used to mitigate mode collapse in GANs by maximizing the
entropy of the copula of fake samples, a model we refer to as Copula Entropy
Regularized GAN (CER-GAN).
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.
We propose a new learning paradigm that integrates both paired and unpaired data.
Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z) - The Sample Complexity Of ERMs In Stochastic Convex Optimization [13.896417716930687]
We show that in fact $tildeO(fracdepsilon+frac1epsilon2)$ data points are also sufficient.
We further generalize the result and show that a similar upper bound holds for all convex bodies.
arXiv Detail & Related papers (2023-11-09T14:29:25Z) - Sparse Gaussian Graphical Models with Discrete Optimization:
Computational and Statistical Perspectives [8.403841349300103]
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model.
We propose GraphL0BnB, a new estimator based on an $ell_0$-penalized version of the pseudolikelihood function.
Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 104$.
arXiv Detail & Related papers (2023-07-18T15:49:02Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Localization in 1D non-parametric latent space models from pairwise
affinities [6.982738885923206]
We consider the problem of estimating latent positions in a one-dimensional torus from pairwise affinities.
We introduce an estimation procedure that provably localizes all the latent positions with a maximum error of the order of $sqrtlog(n)/n$, with high-probability.
arXiv Detail & Related papers (2021-08-06T13:05:30Z) - SDP Achieves Exact Minimax Optimality in Phase Synchronization [19.909352968029584]
We study the phase synchronization problem with noisy measurements $Y=z*z*+sigma WinmathbbCntimes ntimes n.
We prove that SDP achieves error bound $ (1+o)fracnp22np$ under a squared $ell$ loss.
arXiv Detail & Related papers (2021-01-07T03:14:05Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.