Statistical Efficiency of Score Matching: The View from Isoperimetry
- URL: http://arxiv.org/abs/2210.00726v1
- Date: Mon, 3 Oct 2022 06:09:01 GMT
- Title: Statistical Efficiency of Score Matching: The View from Isoperimetry
- Authors: Frederic Koehler, Alexander Heckett, Andrej Risteski
- Abstract summary: We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
- Score: 96.65637602827942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models parametrized up to a normalizing constant (e.g.
energy-based models) are difficult to train by maximizing the likelihood of the
data because the likelihood and/or gradients thereof cannot be explicitly or
efficiently written down. Score matching is a training method, whereby instead
of fitting the likelihood $\log p(x)$ for the training data, we instead fit the
score function $\nabla_x \log p(x)$ -- obviating the need to evaluate the
partition function. Though this estimator is known to be consistent, its
unclear whether (and when) its statistical efficiency is comparable to that of
maximum likelihood -- which is known to be (asymptotically) optimal. We
initiate this line of inquiry in this paper, and show a tight connection
between statistical efficiency of score matching and the isoperimetric
properties of the distribution being estimated -- i.e. the Poincar\'e,
log-Sobolev and isoperimetric constant -- quantities which govern the mixing
time of Markov processes like Langevin dynamics. Roughly, we show that the
score matching estimator is statistically comparable to the maximum likelihood
when the distribution has a small isoperimetric constant. Conversely, if the
distribution has a large isoperimetric constant -- even for simple families of
distributions like exponential families with rich enough sufficient statistics
-- score matching will be substantially less efficient than maximum likelihood.
We suitably formalize these results both in the finite sample regime, and in
the asymptotic regime. Finally, we identify a direct parallel in the discrete
setting, where we connect the statistical properties of pseudolikelihood
estimation with approximate tensorization of entropy and the Glauber dynamics.
Related papers
- Asymptotically Optimal Change Detection for Unnormalized Pre- and Post-Change Distributions [65.38208224389027]
This paper addresses the problem of detecting changes when only unnormalized pre- and post-change distributions are accessible.
Our approach is based on the estimation of the Cumulative Sum statistics, which is known to produce optimal performance.
arXiv Detail & Related papers (2024-10-18T17:13:29Z) - Statistical Inference in Tensor Completion: Optimal Uncertainty Quantification and Statistical-to-Computational Gaps [7.174572371800217]
This paper presents a simple yet efficient method for statistical inference of tensor linear forms using incomplete and noisy observations.
It is suitable for various statistical inference tasks, including constructing confidence intervals, inference under heteroskedastic and sub-exponential noise, and simultaneous testing.
arXiv Detail & Related papers (2024-10-15T03:09:52Z) - On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under strongly log-concave data.
Our class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function.
This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Provable benefits of score matching [30.317535687908755]
We give the first example of a natural exponential family of distributions such that score matching loss is computationally efficient to optimize.
We show that designing a zeroth-order or first-order oracle for optimizing the likelihood loss is NP-hard.
Minimizing the score matching loss is both computationally and statistically efficient, with complexity in the ambient dimension.
arXiv Detail & Related papers (2023-06-03T03:42:30Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators
with Massive Data [20.79270369203348]
Existing methods mostly focus on subsampling with replacement due to its high computational efficiency.
We first derive optimal subsampling probabilities in the context of quasi-likelihood estimation.
We develop a distributed subsampling framework, in which statistics are computed simultaneously on smaller partitions of the full data.
arXiv Detail & Related papers (2020-05-21T02:46:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.