Related papers: Branch-and-Pruning Optimization Towards Global Optimality in Deep Learning

Branch-and-Pruning Optimization Towards Global Optimality in Deep Learning

URL: http://arxiv.org/abs/2104.01730v1
Date: Mon, 5 Apr 2021 00:43:03 GMT
Title: Branch-and-Pruning Optimization Towards Global Optimality in Deep Learning
Authors: Yuanwei Wu, Ziming Zhang and Guanghui Wang
Abstract summary: We propose a novel approximation algorithm, em BPGrad, towards optimizing deep models globally via branch and pruning. We prove that, by repeating such a branch-and-pruning procedure, we can locate the global optimality within finite iterations. Empirically an efficient adaptive solver based on BPGrad for DL is proposed as well, and it outperforms conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam.
Score: 34.5772952287821
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: It has been attracting more and more attention to understand the global optimality in deep learning (DL) recently. However, conventional DL solvers, have not been developed intentionally to seek for such global optimality. In this paper, we propose a novel approximation algorithm, {\em BPGrad}, towards optimizing deep models globally via branch and pruning. The proposed BPGrad algorithm is based on the assumption of Lipschitz continuity in DL, and as a result, it can adaptively determine the step size for the current gradient given the history of previous updates, wherein theoretically no smaller steps can achieve the global optimality. We prove that, by repeating such a branch-and-pruning procedure, we can locate the global optimality within finite iterations. Empirically an efficient adaptive solver based on BPGrad for DL is proposed as well, and it outperforms conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam in the tasks of object recognition, detection, and segmentation. The code is available at \url{https://github.com/RyanCV/BPGrad}.

Related papers

Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition [7.300356389675634]
A common pursuit in modern statistical learning is to attain satisfactory generalization out of the source data distribution. This paper studies the foundational (high-dimensional) linear regression where the ground truth variables are confined to an ellipse-shape constraint.
arXiv Detail & Related papers (2025-02-13T08:02:15Z)
Gradient Descent Methods for Regularized Optimization [0.6624754673303327]
The gradient descent (GD) method is one of the primary methods used for numerical optimization of differentiable objective functions. A more effective version of GD, called the proximal gradient descent, employs a technique known as soft-thresholding to shrink the iteration updates toward zero. This paper proposes a novel algorithm for the proximal GD method that incorporates a variable step size.
arXiv Detail & Related papers (2024-12-28T10:54:15Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption. The main idea of the method is to adapt the $alpha by situational awareness. It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z)
Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation [84.0621253654014]
We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest. We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO.
arXiv Detail & Related papers (2023-07-25T09:45:47Z)
Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs [27.41101006357176]
In this work, we take a minorization-maximization perspective to iteratively optimize the. w.r.t. a locally tight lower-bounded objective. This novel formulation of learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective. Empirical evaluation confirms that ILBO is significantly more sample-efficient than the state-of-the-art planner.
arXiv Detail & Related papers (2022-03-23T19:06:16Z)
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape [15.362190838843915]
We show that LPF-SGD converges to a better optimal point with smaller generalization error than SGD. We show that our algorithm achieves superior generalization performance compared to the common DL training strategies.
arXiv Detail & Related papers (2022-01-20T07:13:04Z)
Bregman Gradient Policy Optimization [97.73041344738117]
We design a Bregman gradient policy optimization for reinforcement learning based on Bregman divergences and momentum techniques. VR-BGPO reaches the best complexity $tilde(epsilon-3)$ for finding an $epsilon$stationary point only requiring one trajectory at each iteration.
arXiv Detail & Related papers (2021-06-23T01:08:54Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)
Bayesian Optimization Meets Laplace Approximation for Robotic Introspection [41.117361086267806]
We introduce a scalable Laplace Approximation (LA) technique to make Deep Neural Networks (DNNs) more introspective. In particular, we propose a novel Bayesian Optimization (BO) algorithm to mitigate their tendency of under-fitting the true weight posterior. We show that the proposed framework can be scaled up to large datasets and architectures.
arXiv Detail & Related papers (2020-10-30T09:28:10Z)
Learning to be Global Optimizer [28.88646928299302]
We learn an optimal network and escaping capability algorithm for some benchmark functions. We show that the learned algorithm significantly outperforms some well-known classical optimization algorithms.
arXiv Detail & Related papers (2020-03-10T03:46:25Z)
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory. Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.