Concentration Inequalities for the Stochastic Optimization of Unbounded Objectives with Application to Denoising Score Matching
- URL: http://arxiv.org/abs/2502.08628v1
- Date: Wed, 12 Feb 2025 18:30:36 GMT
- Title: Concentration Inequalities for the Stochastic Optimization of Unbounded Objectives with Application to Denoising Score Matching
- Authors: Jeremiah Birrell,
- Abstract summary: We derive concentration inequalities that bound the statistical error for a large class of optimization problems.
Our results establish the benefit of sample reuse in algorithms that employ easily sampled auxiliary random variables.
- Score: 5.022028859839544
- License:
- Abstract: We derive novel concentration inequalities that bound the statistical error for a large class of stochastic optimization problems, focusing on the case of unbounded objective functions. Our derivations utilize the following tools: 1) A new form of McDiarmid's inequality that is based on sample dependent one component difference bounds and which leads to a novel uniform law of large numbers result for unbounded functions. 2) A Rademacher complexity bound for families of functions that satisfy an appropriate local Lipschitz property. As an application of these results, we derive statistical error bounds for denoising score matching (DSM), an application that inherently requires one to consider unbounded objective functions, even when the data distribution has bounded support. In addition, our results establish the benefit of sample reuse in algorithms that employ easily sampled auxiliary random variables in addition to the training data, e.g., as in DSM, which uses auxiliary Gaussian random variables.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests [5.3595271893779906]
Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations.
We study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints.
arXiv Detail & Related papers (2024-06-10T19:25:19Z) - An Inexact Halpern Iteration with Application to Distributionally Robust
Optimization [9.529117276663431]
We investigate the inexact variants of the scheme in both deterministic and deterministic convergence settings.
We show that by choosing the inexactness appropriately, the inexact schemes admit an $O(k-1) convergence rate in terms of the (expected) residue norm.
arXiv Detail & Related papers (2024-02-08T20:12:47Z) - On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under strongly log-concave data.
Our class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function.
This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Amortized variance reduction for doubly stochastic objectives [17.064916635597417]
Approximate inference in complex probabilistic models requires optimisation of doubly objective functions.
Current approaches do not take into account how mini-batchity affects samplingity, resulting in sub-optimal variance reduction.
We propose a new approach in which we use a recognition network to cheaply approximate the optimal control variate for each mini-batch, with no additional gradient computations.
arXiv Detail & Related papers (2020-03-09T13:23:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.