Related papers: On Computationally Efficient Learning of Exponential Family Distributions

On Computationally Efficient Learning of Exponential Family Distributions

URL: http://arxiv.org/abs/2309.06413v1
Date: Tue, 12 Sep 2023 17:25:32 GMT
Title: On Computationally Efficient Learning of Exponential Family Distributions
Authors: Abhin Shah, Devavrat Shah, Gregory W. Wornell
Abstract summary: We focus on the setting where the support as well as the natural parameters are appropriately bounded. Our method achives the order-optimal sample complexity of $O(sf log(k)/alpha2)$ when tailored for node-wise-sparse random fields.
Score: 33.229944519289795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the classical problem of learning, with arbitrary accuracy, the natural parameters of a $k$-parameter truncated \textit{minimal} exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotically normal, and asymptotically efficient, evaluating it is computationally hard. In this work, we propose a novel loss function and a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions. We show that, at the population level, our method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family. Further, we show that our estimator can be interpreted as a solution to minimizing a particular Bregman score as well as an instance of minimizing the \textit{surrogate} likelihood. We also provide finite sample guarantees to achieve an error (in $\ell_2$-norm) of $\alpha$ in the parameter estimation with sample complexity $O({\sf poly}(k)/\alpha^2)$. Our method achives the order-optimal sample complexity of $O({\sf log}(k)/\alpha^2)$ when tailored for node-wise-sparse Markov random fields. Finally, we demonstrate the performance of our estimator via numerical experiments.

Related papers

Pathwise optimization for bridge-type estimators and its applications [49.1574468325115]
Pathwise methods allow to efficiently compute the full path for penalized estimators. We apply these algorithms to the penalized estimation of processes observed at discrete times.
arXiv Detail & Related papers (2024-12-05T10:38:29Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples. We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z)
Provable benefits of score matching [30.317535687908755]
We give the first example of a natural exponential family of distributions such that score matching loss is computationally efficient to optimize. We show that designing a zeroth-order or first-order oracle for optimizing the likelihood loss is NP-hard. Minimizing the score matching loss is both computationally and statistically efficient, with complexity in the ambient dimension.
arXiv Detail & Related papers (2023-06-03T03:42:30Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated. We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z)
Asymptotic Bounds for Smoothness Parameter Estimates in Gaussian Process Interpolation [3.8979646385036175]
The smoothness of a Mat'ern kernel determines many important properties of the model in the large data limit. We prove that the maximum likelihood estimate of the smoothness parameter cannot be understated under the truth. We show that maximum likelihood estimation recovers the true smoothness for a class of compactly supported self-similar functions.
arXiv Detail & Related papers (2022-03-10T14:45:57Z)
A Computationally Efficient Method for Learning Exponential Family Distributions [29.289136623702056]
We consider the question of learning the natural parameters of a $k$ parameter exponential family from i.i.d. samples in a computationally and statistically efficient manner. We show that our method can be viewed as the maximum likelihood estimation of a re-ized distribution belonging to the same class of exponential family.
arXiv Detail & Related papers (2021-10-28T18:42:04Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes. In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model. We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z)
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components. We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.