Block majorization-minimization with diminishing radius for constrained
nonconvex optimization
- URL: http://arxiv.org/abs/2012.03503v5
- Date: Tue, 7 Nov 2023 04:45:38 GMT
- Title: Block majorization-minimization with diminishing radius for constrained
nonconvex optimization
- Authors: Hanbaek Lyu and Yuchen Li
- Abstract summary: Block tensor regularization-minimization (BMM) is a simple iterative algorithm for non constrained optimization that minimizes major surrogates in each block.
We show that BMM can produce a gradient $O(epsilon-2(logepsilon-1)2)$ when convex surrogates are used.
- Score: 9.907540661545328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Block majorization-minimization (BMM) is a simple iterative algorithm for
nonconvex constrained optimization that sequentially minimizes majorizing
surrogates of the objective function in each block coordinate while the other
coordinates are held fixed. BMM entails a large class of optimization
algorithms such as block coordinate descent and its proximal-point variant,
expectation-minimization, and block projected gradient descent. We establish
that for general constrained nonconvex optimization, BMM with strongly convex
surrogates can produce an $\epsilon$-stationary point within
$O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$ iterations and asymptotically
converges to the set of stationary points. Furthermore, we propose a
trust-region variant of BMM that can handle surrogates that are only convex and
still obtain the same iteration complexity and asymptotic stationarity. These
results hold robustly even when the convex sub-problems are inexactly solved as
long as the optimality gaps are summable. As an application, we show that a
regularized version of the celebrated multiplicative update algorithm for
nonnegative matrix factorization by Lee and Seung has iteration complexity of
$O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$. The same result holds for a wide
class of regularized nonnegative tensor decomposition algorithms as well as the
classical block projected gradient descent algorithm. These theoretical results
are validated through various numerical experiments.
Related papers
- On Linear Convergence in Smooth Convex-Concave Bilinearly-Coupled Saddle-Point Optimization: Lower Bounds and Optimal Algorithms [17.227158587717934]
We revisit the smooth convex-concave bilinearly-coupled saddle-point problem of the form $min_xmax_y f(x) + langle y,mathbfB xrangle - g(y)$.
We develop the first lower complexity bounds and matching optimal linearly converging algorithms for this problem class.
arXiv Detail & Related papers (2024-11-21T22:06:25Z) - Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization [18.425648833592312]
Blockization-minimization (BMM) is a simple iterative gradient for non-exhaustive subspace estimation.
Our analysis makes explicit use of Euclidian constraints.
arXiv Detail & Related papers (2023-12-16T05:40:19Z) - Nonsmooth Projection-Free Optimization with Functional Constraints [12.20060970177791]
This paper presents a subgradient-based algorithm for constrained nonsmooth convex computation.
Our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints.
Similar performance is observed when deterministic subgradients are replaced with subgradients.
arXiv Detail & Related papers (2023-11-18T23:06:33Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - Stochastic regularized majorization-minimization with weakly convex and
multi-convex surrogates [0.0]
We show that the first optimality gap of the proposed algorithm decays at the rate of the expected loss for various methods under nontens dependent data setting.
We obtain first convergence point for various methods under nontens dependent data setting.
arXiv Detail & Related papers (2022-01-05T15:17:35Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.