On Computationally Efficient Learning of Exponential Family
Distributions
- URL: http://arxiv.org/abs/2309.06413v1
- Date: Tue, 12 Sep 2023 17:25:32 GMT
- Title: On Computationally Efficient Learning of Exponential Family
Distributions
- Authors: Abhin Shah, Devavrat Shah, Gregory W. Wornell
- Abstract summary: We focus on the setting where the support as well as the natural parameters are appropriately bounded.
Our method achives the order-optimal sample complexity of $O(sf log(k)/alpha2)$ when tailored for node-wise-sparse random fields.
- Score: 33.229944519289795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the classical problem of learning, with arbitrary accuracy, the
natural parameters of a $k$-parameter truncated \textit{minimal} exponential
family from i.i.d. samples in a computationally and statistically efficient
manner. We focus on the setting where the support as well as the natural
parameters are appropriately bounded. While the traditional maximum likelihood
estimator for this class of exponential family is consistent, asymptotically
normal, and asymptotically efficient, evaluating it is computationally hard. In
this work, we propose a novel loss function and a computationally efficient
estimator that is consistent as well as asymptotically normal under mild
conditions. We show that, at the population level, our method can be viewed as
the maximum likelihood estimation of a re-parameterized distribution belonging
to the same class of exponential family. Further, we show that our estimator
can be interpreted as a solution to minimizing a particular Bregman score as
well as an instance of minimizing the \textit{surrogate} likelihood. We also
provide finite sample guarantees to achieve an error (in $\ell_2$-norm) of
$\alpha$ in the parameter estimation with sample complexity $O({\sf
poly}(k)/\alpha^2)$. Our method achives the order-optimal sample complexity of
$O({\sf log}(k)/\alpha^2)$ when tailored for node-wise-sparse Markov random
fields. Finally, we demonstrate the performance of our estimator via numerical
experiments.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - Provable benefits of score matching [30.317535687908755]
We give the first example of a natural exponential family of distributions such that score matching loss is computationally efficient to optimize.
We show that designing a zeroth-order or first-order oracle for optimizing the likelihood loss is NP-hard.
Minimizing the score matching loss is both computationally and statistically efficient, with complexity in the ambient dimension.
arXiv Detail & Related papers (2023-06-03T03:42:30Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Asymptotic Bounds for Smoothness Parameter Estimates in Gaussian Process
Interpolation [3.8979646385036175]
The smoothness of a Mat'ern kernel determines many important properties of the model in the large data limit.
We prove that the maximum likelihood estimate of the smoothness parameter cannot be understated under the truth.
We show that maximum likelihood estimation recovers the true smoothness for a class of compactly supported self-similar functions.
arXiv Detail & Related papers (2022-03-10T14:45:57Z) - A Computationally Efficient Method for Learning Exponential Family
Distributions [29.289136623702056]
We consider the question of learning the natural parameters of a $k$ parameter exponential family from i.i.d. samples in a computationally and statistically efficient manner.
We show that our method can be viewed as the maximum likelihood estimation of a re-ized distribution belonging to the same class of exponential family.
arXiv Detail & Related papers (2021-10-28T18:42:04Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes.
In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model.
We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.