Steepest Descent Neural Architecture Optimization: Escaping Local
Optimum with Signed Neural Splitting
- URL: http://arxiv.org/abs/2003.10392v5
- Date: Mon, 21 Jun 2021 01:07:37 GMT
- Title: Steepest Descent Neural Architecture Optimization: Escaping Local
Optimum with Signed Neural Splitting
- Authors: Lemeng Wu, Mao Ye, Qi Lei, Jason D. Lee, Qiang Liu
- Abstract summary: We develop a significant and surprising extension of the splitting descent framework that addresses the local optimality issue.
By simply allowing both positive and negative weights during splitting, we can eliminate the appearance of splitting stability in S2D.
We verify our method on various challenging benchmarks such as CIFAR-100, ImageNet and ModelNet40, on which we outperform S2D and other advanced methods on learning accurate and energy-efficient neural networks.
- Score: 60.97465664419395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing efficient and principled neural architecture optimization methods
is a critical challenge of modern deep learning. Recently, Liu et al.[19]
proposed a splitting steepest descent (S2D) method that jointly optimizes the
neural parameters and architectures based on progressively growing network
structures by splitting neurons into multiple copies in a steepest descent
fashion. However, S2D suffers from a local optimality issue when all the
neurons become "splitting stable", a concept akin to local stability in
parametric optimization. In this work, we develop a significant and surprising
extension of the splitting descent framework that addresses the local
optimality issue. The idea is to observe that the original S2D is unnecessarily
restricted to splitting neurons into positive weighted copies. By simply
allowing both positive and negative weights during splitting, we can eliminate
the appearance of splitting stability in S2D and hence escape the local optima
to obtain better performance. By incorporating signed splittings, we
significantly extend the optimization power of splitting steepest descent both
theoretically and empirically. We verify our method on various challenging
benchmarks such as CIFAR-100, ImageNet and ModelNet40, on which we outperform
S2D and other advanced methods on learning accurate and energy-efficient neural
networks.
Related papers
- Efficient Second-Order Neural Network Optimization via Adaptive Trust Region Methods [0.0]
SecondOrderAdaptive (SOAA) is a novel optimization algorithm designed to overcome limitations of traditional second-order techniques.
We empirically demonstrate that SOAA achieves faster and more stable convergence compared to first-order approximations.
arXiv Detail & Related papers (2024-10-03T08:23:06Z) - Soft Merging: A Flexible and Robust Soft Model Merging Approach for
Enhanced Neural Network Performance [6.599368083393398]
Gradient (SGD) is often limited to converging local optima to improve model performance.
em soft merging method minimizes the obtained local optima models in undesirable results.
Experiments underscore the effectiveness of the merged networks.
arXiv Detail & Related papers (2023-09-21T17:07:31Z) - MIPS-Fusion: Multi-Implicit-Submaps for Scalable and Robust Online
Neural RGB-D Reconstruction [15.853932110058585]
We introduce a robust and scalable online RGB-D reconstruction method based on a novel neural implicit representation -- multi-implicit-submap.
In our method, neural submaps are incrementally allocated alongside the scanning trajectory and efficiently learned with local neural bundle adjustments.
For the first time, randomized optimization is made possible in neural tracking with several key designs to the learning process, enabling efficient and robust tracking even under fast camera motions.
arXiv Detail & Related papers (2023-08-17T02:33:16Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Improving Neural Network Training in Low Dimensional Random Bases [5.156484100374058]
We show that keeping the random projection fixed throughout training is detrimental to optimization.
We propose re-drawing the random subspace at each step, which yields significantly better performance.
We realize further improvements by applying independent projections to different parts of the network, making the approximation more efficient as network dimensionality grows.
arXiv Detail & Related papers (2020-11-09T19:50:19Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.