Iterative Hard Thresholding with Adaptive Regularization: Sparser
Solutions Without Sacrificing Runtime
- URL: http://arxiv.org/abs/2204.08274v1
- Date: Mon, 11 Apr 2022 19:33:15 GMT
- Title: Iterative Hard Thresholding with Adaptive Regularization: Sparser
Solutions Without Sacrificing Runtime
- Authors: Kyriakos Axiotis and Maxim Sviridenko
- Abstract summary: We propose a simple modification to the iterative hard thresholding algorithm, which recoversally sparser solutions as a function of the condition number.
Our proposed algorithm, regularized IHT, returns a solution with sparsity $O(skappa)$.
Our algorithm significantly improves over ARHT which also finds a solution of sparsity $O(skappa)$.
- Score: 17.60502131429094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a simple modification to the iterative hard thresholding (IHT)
algorithm, which recovers asymptotically sparser solutions as a function of the
condition number. When aiming to minimize a convex function $f(x)$ with
condition number $\kappa$ subject to $x$ being an $s$-sparse vector, the
standard IHT guarantee is a solution with relaxed sparsity $O(s\kappa^2)$,
while our proposed algorithm, regularized IHT, returns a solution with sparsity
$O(s\kappa)$. Our algorithm significantly improves over ARHT which also finds a
solution of sparsity $O(s\kappa)$, as it does not require re-optimization in
each iteration (and so is much faster), is deterministic, and does not require
knowledge of the optimal solution value $f(x^*)$ or the optimal sparsity level
$s$. Our main technical tool is an adaptive regularization framework, in which
the algorithm progressively learns the weights of an $\ell_2$ regularization
term that will allow convergence to sparser solutions. We also apply this
framework to low rank optimization, where we achieve a similar improvement of
the best known condition number dependence from $\kappa^2$ to $\kappa$.
Related papers
- Quantum linear system algorithm with optimal queries to initial state preparation [0.0]
Quantum algorithms for linear systems produce the solution state $A-1|brangle$ by querying two oracles.
We present a quantum linear system algorithm making $mathbfThetaleft (1/sqrtpright)$ queries to $O_b$, which is optimal in the success probability.
In various applications such as solving differential equations, we can further improve the dependence on $p$ using a block preconditioning scheme.
arXiv Detail & Related papers (2024-10-23T18:00:01Z) - Online Convex Optimization with a Separation Oracle [10.225358400539719]
We introduce a new projection-free algorithm for Online Convex Optimization (OCO) with a state-of-the-art regret guarantee.
Our algorithm achieves a regret bound of $widetildeO(sqrtdT + kappa d)$, while requiring only $widetildeO(1) calls to a separation oracle per round.
arXiv Detail & Related papers (2024-10-03T13:35:08Z) - An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - Accelerating Inexact HyperGradient Descent for Bilevel Optimization [84.00488779515206]
We present a method for solving general non-strongly-concave bilevel optimization problems.
Our results also improve upon the existing complexity for finding second-order stationary points in non-strongly-concave problems.
arXiv Detail & Related papers (2023-06-30T20:36:44Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Nearly Optimal Policy Optimization with Stable at Any Time Guarantee [53.155554415415445]
Policy-based method in citetshani 2020optimistic is only $tildeO(sqrtSAH3K + sqrtAH4K)$ where $S$ is the number of states, $A$ is the number of actions, $H$ is the horizon, and $K$ is the number of episodes, and there is a $sqrtSH$ gap compared with the information theoretic lower bound $tildeOmega(sqrtSAH
arXiv Detail & Related papers (2021-12-21T01:54:17Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Local Search Algorithms for Rank-Constrained Convex Optimization [7.736462653684946]
We propose greedy and local search algorithms for rank-constrained convex optimization.
We show that if the rank-restricted condition number of $R$ is $kappa$, a solution $A$ with rank $O(r*cdot minkappa log fracR(mathbf0)-R(A*)epsilon, kappa2)$ and $R(A) leq R(A*) + epsilon$ can be recovered, where $A
arXiv Detail & Related papers (2021-01-15T18:52:02Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Sparse Convex Optimization via Adaptively Regularized Hard Thresholding [17.60502131429094]
We present a new Adaptively Regularized Hard Thresholding (ARHT) algorithm that makes significant progress on this problem.
We also provide a new analysis of OMP with Replacement (OMPR) for general $f$, under the condition $s > s* frackappa24$.
arXiv Detail & Related papers (2020-06-25T17:16:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.