Complexity of Block Coordinate Descent with Proximal Regularization and
Applications to Wasserstein CP-dictionary Learning
- URL: http://arxiv.org/abs/2306.02420v1
- Date: Sun, 4 Jun 2023 17:52:49 GMT
- Title: Complexity of Block Coordinate Descent with Proximal Regularization and
Applications to Wasserstein CP-dictionary Learning
- Authors: Dohyun Kwon, Hanbaek Lyu
- Abstract summary: We consider the block coordinate descent of Gauss-Sdel type with regularization (BCD-PR)
We show that an W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(W(
- Score: 1.4010916616909743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the block coordinate descent methods of Gauss-Seidel type with
proximal regularization (BCD-PR), which is a classical method of minimizing
general nonconvex objectives under constraints that has a wide range of
practical applications. We theoretically establish the worst-case complexity
bound for this algorithm. Namely, we show that for general nonconvex smooth
objectives with block-wise constraints, the classical BCD-PR algorithm
converges to an epsilon-stationary point within O(1/epsilon) iterations. Under
a mild condition, this result still holds even if the algorithm is executed
inexactly in each step. As an application, we propose a provable and efficient
algorithm for `Wasserstein CP-dictionary learning', which seeks a set of
elementary probability distributions that can well-approximate a given set of
d-dimensional joint probability distributions. Our algorithm is a version of
BCD-PR that operates in the dual space, where the primal problem is regularized
both entropically and proximally.
Related papers
- Bregman-divergence-based Arimoto-Blahut algorithm [53.64687146666141]
We generalize the Arimoto-Blahut algorithm to a general function defined over Bregman-divergence system.
We propose a convex-optimization-free algorithm that can be applied to classical and quantum rate-distortion theory.
arXiv Detail & Related papers (2024-08-10T06:16:24Z) - Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization [18.425648833592312]
Blockization-minimization (BMM) is a simple iterative gradient for non-exhaustive subspace estimation.
Our analysis makes explicit use of Euclidian constraints.
arXiv Detail & Related papers (2023-12-16T05:40:19Z) - A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning [9.628032156001073]
We propose two policy Newton algorithms that incorporate cubic regularization.
Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function.
In particular, the sample complexity of our algorithms to find an $epsilon$-SOSP is $O(epsilon-3.5)$, which is an improvement over the state-of-the-art sample complexity of $O(epsilon-4.5)$.
arXiv Detail & Related papers (2023-04-21T13:43:06Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Infinite-Dimensional Sparse Learning in Linear System Identification [0.2867517731896504]
This paper proposes an infinite-dimensional sparse learning algorithm based on atomic norm regularization.
The difficulty in solving the problem lies in the fact that there are an infinite number of possible atomic models.
arXiv Detail & Related papers (2022-03-28T13:18:48Z) - Stationary Behavior of Constant Stepsize SGD Type Algorithms: An
Asymptotic Characterization [4.932130498861987]
We study the behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero.
We show that the limiting scaled stationary distribution is a solution of an integral equation.
arXiv Detail & Related papers (2021-11-11T17:39:50Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Explicit Regularization of Stochastic Gradient Methods through Duality [9.131027490864938]
We propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent.
For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing gradient methods in the interpolating regime.
arXiv Detail & Related papers (2020-03-30T20:44:56Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.