Cogradient Descent for Bilinear Optimization
- URL: http://arxiv.org/abs/2006.09142v1
- Date: Tue, 16 Jun 2020 13:41:54 GMT
- Title: Cogradient Descent for Bilinear Optimization
- Authors: Li'an Zhuo, Baochang Zhang, Linlin Yang, Hanlin Chen, Qixiang Ye,
David Doermann, Guodong Guo, Rongrong Ji
- Abstract summary: We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
- Score: 124.45816011848096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional learning methods simplify the bilinear model by regarding two
intrinsically coupled factors independently, which degrades the optimization
procedure. One reason lies in the insufficient training due to the asynchronous
gradient descent, which results in vanishing gradients for the coupled
variables. In this paper, we introduce a Cogradient Descent algorithm (CoGD) to
address the bilinear problem, based on a theoretical framework to coordinate
the gradient of hidden variables via a projection function. We solve one
variable by considering its coupling relationship with the other, leading to a
synchronous gradient descent to facilitate the optimization procedure. Our
algorithm is applied to solve problems with one variable under the sparsity
constraint, which is widely used in the learning paradigm. We validate our CoGD
considering an extensive set of applications including image reconstruction,
inpainting, and network pruning. Experiments show that it improves the
state-of-the-art by a significant margin.
Related papers
- Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Implicit regularization in AI meets generalized hardness of
approximation in optimization -- Sharp results for diagonal linear networks [0.0]
We show sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks.
We link this to the phenomenon of phase transitions in generalized hardness of approximation.
Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem.
arXiv Detail & Related papers (2023-07-13T13:27:51Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - A conditional gradient homotopy method with applications to Semidefinite
Programming [1.6369790794838281]
homotopy-based conditional gradient method for solving convex optimization problems with a large number of simple conic constraints.
Our theoretical complexity is competitive when confronted to state-of-the-art SDP, with the decisive advantage of cheap projection-frees.
arXiv Detail & Related papers (2022-07-07T05:48:27Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z) - Conditional Gradient Methods for Convex Optimization with General Affine
and Nonlinear Constraints [8.643249539674612]
This paper presents new conditional gradient methods for solving convex optimization problems with general affine and nonlinear constraints.
We first present a new constraint extrapolated condition gradient (CoexCG) method that can achieve an $cal O (1/epsilon2)$ iteration complexity for both smooth and structured nonsmooth function constrained convex optimization.
We further develop novel variants of CoexCG, namely constraint extrapolated and dual regularized conditional gradient (CoexDurCG) methods, that can achieve similar iteration complexity to CoexCG but allow adaptive selection for algorithmic parameters.
arXiv Detail & Related papers (2020-06-30T23:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.