Learning to Initialize Gradient Descent Using Gradient Descent
- URL: http://arxiv.org/abs/2012.12141v1
- Date: Tue, 22 Dec 2020 16:23:36 GMT
- Title: Learning to Initialize Gradient Descent Using Gradient Descent
- Authors: Kartik Ahuja, Amit Dhurandhar, Kush R. Varshney
- Abstract summary: Non- optimization problems are challenging to solve.
As a simple alternative to hand-crafted algorithms, we propose an approach for learning that performs better than random algorithms.
- Score: 21.02474345607079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-convex optimization problems are challenging to solve; the success and
computational expense of a gradient descent algorithm or variant depend heavily
on the initialization strategy. Often, either random initialization is used or
initialization rules are carefully designed by exploiting the nature of the
problem class. As a simple alternative to hand-crafted initialization rules, we
propose an approach for learning "good" initialization rules from previous
solutions. We provide theoretical guarantees that establish conditions that are
sufficient in all cases and also necessary in some under which our approach
performs better than random initialization. We apply our methodology to various
non-convex problems such as generating adversarial examples, generating post
hoc explanations for black-box machine learning models, and allocating
communication spectrum, and show consistent gains over other initialization
techniques.
Related papers
- Taking the human out of decomposition-based optimization via artificial
intelligence: Part II. Learning to initialize [0.0]
The proposed approach can lead to a significant reduction in solution time.
Active and supervised learning is used to learn a surrogate model that predicts the computational performance.
The results show that the proposed approach can lead to a significant reduction in solution time.
arXiv Detail & Related papers (2023-10-10T23:49:26Z) - Invex Programs: First Order Algorithms and Their Convergence [66.40124280146863]
Invex programs are a special kind of non-constrained problems which attain global minima at every stationary point.
We propose new first-order algorithms to solve general convergence rates in beyondvex problems.
Our proposed algorithm is the first algorithm to solve constrained invex programs.
arXiv Detail & Related papers (2023-07-10T10:11:01Z) - Restarts subject to approximate sharpness: A parameter-free and optimal scheme for first-order methods [0.6554326244334866]
Sharpness is an assumption in continuous optimization that bounds the distance from minima by objective function suboptimality.
Sharpness involves problem-specific constants that are typically unknown, and restart schemes typically reduce convergence rates.
We consider the assumption of approximate sharpness, a generalization of sharpness that incorporates an unknown constant perturbation to the objective function error.
arXiv Detail & Related papers (2023-01-05T19:01:41Z) - Self-adaptive algorithms for quasiconvex programming and applications to
machine learning [0.0]
We provide a self-adaptive step-size strategy that does not include convex line-search techniques and a generic approach under mild assumptions.
The proposed method is verified by preliminary results from some computational examples.
To demonstrate the effectiveness of the proposed technique for large-scale problems, we apply it to some experiments on machine learning.
arXiv Detail & Related papers (2022-12-13T05:30:29Z) - Faster Randomized Methods for Orthogonality Constrained Problems [7.002470330184841]
We show how to expand the application of randomized preconditioning to another important set of problems prevalent across data science.
We demonstrate our approach on the problem of computing the dominant canonical correlations and on the Fisher linear discriminant analysis problem.
arXiv Detail & Related papers (2021-06-22T21:15:00Z) - Provably Convergent Algorithms for Solving Inverse Problems Using
Generative Models [47.208080968675574]
We study the use of generative models in inverse problems with more complete understanding.
We support our claims with experimental results for solving various inverse problems.
We propose an extension of our approach that can handle model mismatch (i.e., situations where the generative prior is not exactly applicable)
arXiv Detail & Related papers (2021-05-13T15:58:27Z) - Data-driven Weight Initialization with Sylvester Solvers [72.11163104763071]
We propose a data-driven scheme to initialize the parameters of a deep neural network.
We show that our proposed method is especially effective in few-shot and fine-tuning settings.
arXiv Detail & Related papers (2021-05-02T07:33:16Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z) - Excursion Search for Constrained Bayesian Optimization under a Limited
Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures.
Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.