GO Hessian for Expectation-Based Objectives
- URL: http://arxiv.org/abs/2006.08873v1
- Date: Tue, 16 Jun 2020 02:20:41 GMT
- Title: GO Hessian for Expectation-Based Objectives
- Authors: Yulai Cong, Miaoyun Zhao, Jianqiao Li, Junya Chen, Lawrence Carin
- Abstract summary: GO gradient was proposed recently for expectation-based objectives $mathbbE_q_boldsymbolboldsymbolgamma(boldsymboly) [f(boldsymboly)]$.
Based on the GO gradient, we present for $mathbbE_q_boldsymbolboldsymbolgamma(boldsymboly) [f(boldsymboly)]$ an unbiased low-variance Hessian estimator, named GO Hessian.
- Score: 73.06986780804269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An unbiased low-variance gradient estimator, termed GO gradient, was proposed
recently for expectation-based objectives
$\mathbb{E}_{q_{\boldsymbol{\gamma}}(\boldsymbol{y})} [f(\boldsymbol{y})]$,
where the random variable (RV) $\boldsymbol{y}$ may be drawn from a stochastic
computation graph with continuous (non-reparameterizable) internal nodes and
continuous/discrete leaves. Upgrading the GO gradient, we present for
$\mathbb{E}_{q_{\boldsymbol{\boldsymbol{\gamma}}}(\boldsymbol{y})}
[f(\boldsymbol{y})]$ an unbiased low-variance Hessian estimator, named GO
Hessian. Considering practical implementation, we reveal that GO Hessian is
easy-to-use with auto-differentiation and Hessian-vector products, enabling
efficient cheap exploitation of curvature information over stochastic
computation graphs. As representative examples, we present the GO Hessian for
non-reparameterizable gamma and negative binomial RVs/nodes. Based on the GO
Hessian, we design a new second-order method for
$\mathbb{E}_{q_{\boldsymbol{\boldsymbol{\gamma}}}(\boldsymbol{y})}
[f(\boldsymbol{y})]$, with rigorous experiments conducted to verify its
effectiveness and efficiency.
Related papers
- Targeted Variance Reduction: Robust Bayesian Optimization of Black-Box
Simulators with Noise Parameters [1.7404865362620803]
We propose a new Bayesian optimization method called Targeted Variance Reduction (TVR)
TVR leverages a novel joint acquisition function over $(mathbfx,boldsymboltheta)$, which targets variance reduction on the objective within the desired region of improvement.
We demonstrate the improved performance of TVR over the state-of-the-art in a suite of numerical experiments and an application to the robust design of automobile brake discs.
arXiv Detail & Related papers (2024-03-06T16:03:37Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Stochastic Zeroth Order Gradient and Hessian Estimators: Variance
Reduction and Refined Bias Bounds [6.137707924685666]
We study zeroth order and Hessian estimators for real-valued functions in $mathbbRn$.
We show that, via taking finite difference along random directions, the variance of gradient finite difference estimators can be significantly reduced.
arXiv Detail & Related papers (2022-05-29T18:53:24Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient [90.14768299744792]
We study synchronous Q-learning with Polyak-Ruppert averaging (a.k.a., averaged Q-leaning) in a $gamma$-discounted MDP.
We establish normality for the iteration averaged $barboldsymbolQ_T$.
In short, our theoretical analysis shows averaged Q-Leaning is statistically efficient.
arXiv Detail & Related papers (2021-12-29T14:47:56Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Tree-Projected Gradient Descent for Estimating Gradient-Sparse
Parameters on Graphs [10.846572437131872]
We study estimation of a gradient-sparse parameter vector $boldsymboltheta* in mathbbRp$.
We show that, under suitable restricted strong convexity and smoothness assumptions for the loss, the resulting estimator achieves the squared-error risk $fracs*n log (1+fracps*)$ up to a multiplicative constant that is independent of $G$.
arXiv Detail & Related papers (2020-05-31T20:08:13Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Stochastic Recursive Gradient Descent Ascent for Stochastic
Nonconvex-Strongly-Concave Minimax Problems [36.645753881826955]
In this paper, we propose a novel method called RecurEnti Ascent (SREDA), which estimates more efficiently using variance.
This method achieves the best known for this problem.
arXiv Detail & Related papers (2020-01-11T09:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.