Branch-and-Pruning Optimization Towards Global Optimality in Deep
Learning
- URL: http://arxiv.org/abs/2104.01730v1
- Date: Mon, 5 Apr 2021 00:43:03 GMT
- Title: Branch-and-Pruning Optimization Towards Global Optimality in Deep
Learning
- Authors: Yuanwei Wu, Ziming Zhang and Guanghui Wang
- Abstract summary: We propose a novel approximation algorithm, em BPGrad, towards optimizing deep models globally via branch and pruning.
We prove that, by repeating such a branch-and-pruning procedure, we can locate the global optimality within finite iterations.
Empirically an efficient adaptive solver based on BPGrad for DL is proposed as well, and it outperforms conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam.
- Score: 34.5772952287821
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: It has been attracting more and more attention to understand the global
optimality in deep learning (DL) recently. However, conventional DL solvers,
have not been developed intentionally to seek for such global optimality. In
this paper, we propose a novel approximation algorithm, {\em BPGrad}, towards
optimizing deep models globally via branch and pruning. The proposed BPGrad
algorithm is based on the assumption of Lipschitz continuity in DL, and as a
result, it can adaptively determine the step size for the current gradient
given the history of previous updates, wherein theoretically no smaller steps
can achieve the global optimality. We prove that, by repeating such a
branch-and-pruning procedure, we can locate the global optimality within finite
iterations. Empirically an efficient adaptive solver based on BPGrad for DL is
proposed as well, and it outperforms conventional DL solvers such as Adagrad,
Adadelta, RMSProp, and Adam in the tasks of object recognition, detection, and
segmentation. The code is available at \url{https://github.com/RyanCV/BPGrad}.
Related papers
- Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition [7.300356389675634]
A common pursuit in modern statistical learning is to attain satisfactory generalization out of the source data distribution.
This paper studies the foundational (high-dimensional) linear regression where the ground truth variables are confined to an ellipse-shape constraint.
arXiv Detail & Related papers (2025-02-13T08:02:15Z) - Gradient Descent Methods for Regularized Optimization [0.6624754673303327]
The gradient descent (GD) method is one of the primary methods used for numerical optimization of differentiable objective functions.
A more effective version of GD, called the proximal gradient descent, employs a technique known as soft-thresholding to shrink the iteration updates toward zero.
This paper proposes a novel algorithm for the proximal GD method that incorporates a variable step size.
arXiv Detail & Related papers (2024-12-28T10:54:15Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Learning Regions of Interest for Bayesian Optimization with Adaptive
Level-Set Estimation [84.0621253654014]
We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest.
We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO.
arXiv Detail & Related papers (2023-07-25T09:45:47Z) - Sample-efficient Iterative Lower Bound Optimization of Deep Reactive
Policies for Planning in Continuous MDPs [27.41101006357176]
In this work, we take a minorization-maximization perspective to iteratively optimize the.
w.r.t. a locally tight lower-bounded objective.
This novel formulation of learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective.
Empirical evaluation confirms that ILBO is significantly more sample-efficient than the state-of-the-art planner.
arXiv Detail & Related papers (2022-03-23T19:06:16Z) - Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning
Optimization Landscape [15.362190838843915]
We show that LPF-SGD converges to a better optimal point with smaller generalization error than SGD.
We show that our algorithm achieves superior generalization performance compared to the common DL training strategies.
arXiv Detail & Related papers (2022-01-20T07:13:04Z) - Bregman Gradient Policy Optimization [97.73041344738117]
We design a Bregman gradient policy optimization for reinforcement learning based on Bregman divergences and momentum techniques.
VR-BGPO reaches the best complexity $tilde(epsilon-3)$ for finding an $epsilon$stationary point only requiring one trajectory at each iteration.
arXiv Detail & Related papers (2021-06-23T01:08:54Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Bayesian Optimization Meets Laplace Approximation for Robotic
Introspection [41.117361086267806]
We introduce a scalable Laplace Approximation (LA) technique to make Deep Neural Networks (DNNs) more introspective.
In particular, we propose a novel Bayesian Optimization (BO) algorithm to mitigate their tendency of under-fitting the true weight posterior.
We show that the proposed framework can be scaled up to large datasets and architectures.
arXiv Detail & Related papers (2020-10-30T09:28:10Z) - Learning to be Global Optimizer [28.88646928299302]
We learn an optimal network and escaping capability algorithm for some benchmark functions.
We show that the learned algorithm significantly outperforms some well-known classical optimization algorithms.
arXiv Detail & Related papers (2020-03-10T03:46:25Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.