Untangling Lariats: Subgradient Following of Variationally Penalized Objectives
- URL: http://arxiv.org/abs/2405.04710v1
- Date: Tue, 7 May 2024 23:08:24 GMT
- Title: Untangling Lariats: Subgradient Following of Variationally Penalized Objectives
- Authors: Kai-Chia Mo, Shai Shalev-Shwartz, Nisæl Shártov,
- Abstract summary: We derive, as special cases of our apparatus, known algorithms for the fused lasso and isotonic regression.
Last but not least, we derive a lattice-based subgradient solvers for variational penalties characterized through the output of arbitrary convolutional filters.
- Score: 10.043139484808949
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe a novel subgradient following apparatus for calculating the optimum of convex problems with variational penalties. In this setting, we receive a sequence $y_i,\ldots,y_n$ and seek a smooth sequence $x_1,\ldots,x_n$. The smooth sequence attains the minimum Bregman divergence to an input sequence with additive variational penalties in the general form of $\sum_i g_i(x_{i+1}-x_i)$. We derive, as special cases of our apparatus, known algorithms for the fused lasso and isotonic regression. Our approach also facilitates new variational penalties such as non-smooth barrier functions. We next derive and analyze multivariate problems in which $\mathbf{x}_i,\mathbf{y}_i\in\mathbb{R}^d$ and variational penalties that depend on $\|\mathbf{x}_{i+1}-\mathbf{x}_i\|$. The norms we consider are $\ell_2$ and $\ell_\infty$ which promote group sparsity. Last but not least, we derive a lattice-based subgradient following for variational penalties characterized through the output of arbitrary convolutional filters. This paradigm yields efficient solvers for problems in which sparse high-order discrete derivatives such as acceleration and jerk are desirable.
Related papers
- Stochastic Smoothed Primal-Dual Algorithms for Nonconvex Optimization with Linear Inequality Constraints [12.624604051853657]
We propose smoothed primal-dual algorithms for solving nonexact optimization problems with linear inequality constraints.
Our algorithms are single-loop iterations based on one gradient at each sample.
Unlike existing methods, our algorithms are free sub, large sizes or increasing parameters and use dual variable updates to ensure feasibility.
arXiv Detail & Related papers (2025-04-10T09:59:43Z) - Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - Stochastic Halpern iteration in normed spaces and applications to reinforcement learning [0.30693357740321775]
We show that if the underlying oracle is uniformly bounded, our method exhibits an overall oracle complexity of $tildeO(varepsilon-5)$.
We propose new synchronous algorithms for average reward and discounted reward Markov decision processes.
arXiv Detail & Related papers (2024-03-19T01:07:35Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - High Probability Guarantees for Random Reshuffling [5.663909018247509]
We consider the gradient method with random reshuffling ($mathsfRR$) for tackling nonmathp optimization problems.
In this work, we first investigate the neural sample complexity for $mathsfRR$s sampling procedure.
Then, we design a random reshuffling method ($mathsfp$mathsfRR$) that involves an additional randomized perturbation procedure stationary points.
arXiv Detail & Related papers (2023-11-20T15:17:20Z) - First and zeroth-order implementations of the regularized Newton method
with lazy approximated Hessians [4.62316736194615]
We develop Lip-order (Hessian-O) and zero-order (derivative-free) implementations of general non-free$ normfree problems.
We also equip our algorithms with the lazy bound update that reuses a previously computed Hessian approximation matrix for several iterations.
arXiv Detail & Related papers (2023-09-05T17:40:54Z) - A Fast Algorithm for Adaptive Private Mean Estimation [5.090363690988394]
We design an $(varepsilon, delta)$-differentially private algorithm that is adaptive to $Sigma$.
The estimator achieves optimal rates of convergence with respect to the induced Mahalanobis norm $||cdot||_Sigma$.
arXiv Detail & Related papers (2023-01-17T18:44:41Z) - Sparse Bayesian Lasso via a Variable-Coefficient $\ell_1$ Penalty [0.9176056742068814]
We define learnable penalty weights $lambda_p$ with hyperpriors.
We study the theoretical properties of this variable-co-efficient $ell_$ penalty in the context of penalized likelihood.
We develop a model we call the Sparse Bayesian Lasso which allows for behavior endowed qualitatively like Lasso regression to be applied to arbitrary variational models.
arXiv Detail & Related papers (2022-11-09T18:19:23Z) - Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum
Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure.
In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z) - Private Isotonic Regression [54.32252900997422]
We consider the problem of isotonic regression over a partially ordered set (poset) $mathcalX$ and for any Lipschitz loss function.
We obtain a pure-DP algorithm that has an expected excess empirical risk of roughly $mathrmwidth(mathcalX) cdot log|mathcalX| / n$, where $mathrmwidth(mathcalX)$ is the width of the poset.
We show that the bounds above are essentially the best that can be
arXiv Detail & Related papers (2022-10-27T05:08:07Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Smoothed Analysis with Adaptive Adversaries [28.940767013349408]
We prove novel algorithmic guarantees for several online problems in the smoothed analysis model.
In this model, at each time an adversary chooses an input distribution with density function bounded above by $tfrac1sigma$ times that of the uniform distribution.
Our results hold for em adaptive adversaries that can choose an input distribution based on the decisions of the algorithm and the realizations of the inputs in the previous time steps.
arXiv Detail & Related papers (2021-02-16T20:54:49Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.