Sparse Representations of Positive Functions via First and Second-Order
Pseudo-Mirror Descent
- URL: http://arxiv.org/abs/2011.07142v4
- Date: Tue, 3 May 2022 20:42:27 GMT
- Title: Sparse Representations of Positive Functions via First and Second-Order
Pseudo-Mirror Descent
- Authors: Abhishek Chakraborty, Ketan Rajawat, Alec Koppel
- Abstract summary: We consider expected risk problems when the range of the estimator is required to be nonnegative.
We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients.
Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
- Score: 15.340540198612823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider expected risk minimization problems when the range of the
estimator is required to be nonnegative, motivated by the settings of maximum
likelihood estimation (MLE) and trajectory optimization. To facilitate
nonlinear interpolation, we hypothesize that the search space is a Reproducing
Kernel Hilbert Space (RKHS). We develop first and second-order variants of
stochastic mirror descent employing (i) \emph{pseudo-gradients} and (ii)
complexity-reducing projections. Compressive projection in the first-order
scheme is executed via kernel orthogonal matching pursuit (KOMP), which
overcomes the fact that the vanilla RKHS parameterization grows unbounded with
the iteration index in the stochastic setting. Moreover, pseudo-gradients are
needed when gradient estimates for cost are only computable up to some
numerical error, which arise in, e.g., integral approximations. Under constant
step-size and compression budget, we establish tradeoffs between the radius of
convergence of the expected sub-optimality and the projection budget parameter,
as well as non-asymptotic bounds on the model complexity. To refine the
solution's precision, we develop a second-order extension which employs
recursively averaged pseudo-gradient outer-products to approximate the Hessian
inverse, whose convergence in mean is established under an additional
eigenvalue decay condition on the Hessian of the optimal RKHS element, which is
unique to this work. Experiments demonstrate favorable performance on
inhomogeneous Poisson Process intensity estimation in practice.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Parameter-free projected gradient descent [0.0]
We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD)
We propose a fully parameter-free version of AdaGrad, which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients.
Our algorithm is able to handle projection steps, does not involve restarts, reweighing along the trajectory or additional evaluations compared to the classical PGD.
arXiv Detail & Related papers (2023-05-31T07:22:44Z) - Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space.
As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z) - Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
arXiv Detail & Related papers (2022-10-23T23:23:23Z) - Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives [1.7640556247739623]
We analyze algorithms for zeroth-order entropy composite objectives, focusing on dependence on dimensionality.
This is achieved by exploiting low dimensional structure of the decision set using the mirror descent method with an estimation alike function.
To improve the gradient, we replace the classic sampling method based on Rademacher and show that the mini-batch method copes with non-Eucli geometry.
arXiv Detail & Related papers (2022-08-09T07:36:25Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Stochastic Zeroth-order Riemannian Derivative Estimation and
Optimization [15.78743548731191]
We propose an oracle version of the Gaussian smoothing function to overcome the difficulty of non-linearity of manifold non-linearity.
We demonstrate the applicability of our algorithms by results and real-world applications on black-box stiffness control for robotics and black-box attacks to neural networks.
arXiv Detail & Related papers (2020-03-25T06:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.