BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods
to Deep Binary Model
- URL: http://arxiv.org/abs/2009.13799v1
- Date: Tue, 29 Sep 2020 06:12:32 GMT
- Title: BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods
to Deep Binary Model
- Authors: Junjie Liu, Dongchao Wen, Deyu Wang, Wei Tao, Tse-Wei Chen, Kinya Osa,
Masami Kato
- Abstract summary: Recent methods have significantly reduced the performance of Binary Neural Networks (BNNs)
guaranteeing the effective and efficient training of BNNs is an unsolved problem.
We propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors.
- Score: 34.093978443640616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent methods have significantly reduced the performance degradation of
Binary Neural Networks (BNNs), but guaranteeing the effective and efficient
training of BNNs is an unsolved problem. The main reason is that the estimated
gradients produced by the Straight-Through-Estimator (STE) mismatches with the
gradients of the real derivatives. In this paper, we provide an explicit convex
optimization example where training the BNNs with the traditionally adaptive
optimization methods still faces the risk of non-convergence, and identify that
constraining the range of gradients is critical for optimizing the deep binary
model to avoid highly suboptimal solutions. For solving above issues, we
propose a BAMSProd algorithm with a key observation that the convergence
property of optimizing deep binary model is strongly related to the
quantization errors. In brief, it employs an adaptive range constraint via an
errors measurement for smoothing the gradients transition while follows the
exponential moving strategy from AMSGrad to avoid errors accumulation during
the optimization. The experiments verify the corollary of theoretical
convergence analysis, and further demonstrate that our optimization method can
speed up the convergence about 1:2x and boost the performance of BNNs to a
significant level than the specific binary optimizer about 3:7%, even in a
highly non-convex optimization problem.
Related papers
- Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results [60.92029979853314]
This paper investigates Gradient Normalization without (NSGDC) its gradient reduction variant (NSGDC-VR)
We present significant improvements in the theoretical results for both algorithms.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration [2.984929040246293]
novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models.
New algorithms retain the ease of implementation of the classical GP-UCB, but an additional exploration step facilitates their convergence.
arXiv Detail & Related papers (2024-01-30T14:16:06Z) - Versatile Single-Loop Method for Gradient Estimator: First and Second
Order Optimality, and its Application to Federated Learning [45.78238792836363]
We present a single-loop algorithm named SLEDGE (Single-Loop-E Gradient Estimator) for periodic convergence.
Unlike existing methods, SLEDGE has the advantage of versatility; (ii) second-order optimal, (ii) in the PL region, and (iii) smaller complexity under less of data.
arXiv Detail & Related papers (2022-09-01T11:05:26Z) - Data-driven evolutionary algorithm for oil reservoir well-placement and
control optimization [3.012067935276772]
Generalized data-driven evolutionary algorithm (GDDE) is proposed to reduce the number of simulation runs on well-placement and control optimization problems.
Probabilistic neural network (PNN) is adopted as the classifier to select informative and promising candidates.
arXiv Detail & Related papers (2022-06-07T09:07:49Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Adaptive Sampling of Pareto Frontiers with Binary Constraints Using
Regression and Classification [0.0]
We present a novel adaptive optimization algorithm for black-box multi-objective optimization problems with binary constraints.
Our method is based on probabilistic regression and classification models, which act as a surrogate for the optimization goals.
We also present a novel ellipsoid truncation method to speed up the expected hypervolume calculation.
arXiv Detail & Related papers (2020-08-27T09:15:02Z) - Iterative Surrogate Model Optimization (ISMO): An active learning
algorithm for PDE constrained optimization with deep neural networks [14.380314061763508]
We present a novel active learning algorithm, termed as iterative surrogate model optimization (ISMO)
This algorithm is based on deep neural networks and its key feature is the iterative selection of training data through a feedback loop between deep neural networks and any underlying standard optimization algorithm.
arXiv Detail & Related papers (2020-08-13T07:31:07Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.