Stochastic Learning for Sparse Discrete Markov Random Fields with
Controlled Gradient Approximation Error
- URL: http://arxiv.org/abs/2005.06083v1
- Date: Tue, 12 May 2020 22:48:42 GMT
- Title: Stochastic Learning for Sparse Discrete Markov Random Fields with
Controlled Gradient Approximation Error
- Authors: Sinong Geng, Zhaobin Kuang, Jie Liu, Stephen Wright, David Page
- Abstract summary: We study the $L_$-regularized maximum likelihood estimator/estimation (MLE) problem for discrete Markov random fields (MRFs)
To address these challenges, we consider a verifiable learning framework called proximal gradient (SPG)
We provide novel verifiable bounds to inspect and control the quality of the gradient approximation.
- Score: 10.381976180143328
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the $L_1$-regularized maximum likelihood estimator/estimation (MLE)
problem for discrete Markov random fields (MRFs), where efficient and scalable
learning requires both sparse regularization and approximate inference. To
address these challenges, we consider a stochastic learning framework called
stochastic proximal gradient (SPG; Honorio 2012a, Atchade et al.
2014,Miasojedow and Rejchel 2016). SPG is an inexact proximal gradient
algorithm [Schmidtet al., 2011], whose inexactness stems from the stochastic
oracle (Gibbs sampling) for gradient approximation - exact gradient evaluation
is infeasible in general due to the NP-hard inference problem for discrete MRFs
[Koller and Friedman, 2009]. Theoretically, we provide novel verifiable bounds
to inspect and control the quality of gradient approximation. Empirically, we
propose the tighten asymptotically (TAY) learning strategy based on the
verifiable bounds to boost the performance of SPG.
Related papers
- Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Non-asymptotic bounds for stochastic optimization with biased noisy
gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error.
Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z) - Stochastic Approximate Gradient Descent via the Langevin Algorithm [11.36635610546803]
We introduce the approximate gradient descent (SAGD) as an alternative to the gradient descent for cases where unbiased gradients cannot be trivially obtained.
We show that SAGD performs well experimentally in popular statistical and machine learning problems such as the expectation-maximization algorithm and the variational autoencoders.
arXiv Detail & Related papers (2020-02-13T14:29:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.