Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
- URL: http://arxiv.org/abs/2208.02017v1
- Date: Wed, 3 Aug 2022 12:38:23 GMT
- Title: Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
- Authors: Severin Reiz, Tobias Neckel, Hans-Joachim Bungartz
- Abstract summary: Training deep neural networks consumes increasing computational resource shares in many compute centers.
We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only.
We compare the proposed second-order method with two state-of-the-arts on five representative neural network problems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training deep neural networks consumes increasing computational resource
shares in many compute centers. Often, a brute force approach to obtain
hyperparameter values is employed. Our goal is (1) to enhance this by enabling
second-order optimization methods with fewer hyperparameters for large-scale
neural networks and (2) to perform a survey of the performance optimizers for
specific tasks to suggest users the best one for their problem. We introduce a
novel second-order optimization method that requires the effect of the Hessian
on a vector only and avoids the huge cost of explicitly setting up the Hessian
for large-scale networks.
We compare the proposed second-order method with two state-of-the-art
optimizers on five representative neural network problems, including regression
and very deep networks from computer vision or variational autoencoders. For
the largest setup, we efficiently parallelized the optimizers with Horovod and
applied it to a 8 GPU NVIDIA P100 (DGX-1) machine.
Related papers
- Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers [14.756988176469365]
An effective approach to reduce computational requirements and increase efficiency is to prune unnecessary components of Deep Neural Networks.
Previous work has shown that attribution methods from the field of eXplainable AI serve as effective means to extract and prune the least relevant network components in a few-shot fashion.
arXiv Detail & Related papers (2024-08-22T17:35:18Z) - Convergence and scaling of Boolean-weight optimization for hardware
reservoirs [0.0]
We analytically derive the scaling laws for highly efficient Coordinate Descent applied to optimize the readout layer of a random recurrently connection neural network.
Our results perfectly reproduce the convergence and scaling of a large-scale photonic reservoir implemented in a proof-of-concept experiment.
arXiv Detail & Related papers (2023-05-13T12:15:25Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Efficient and Sparse Neural Networks by Pruning Weights in a
Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions.
Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Hyperparameter Optimization in Binary Communication Networks for
Neuromorphic Deployment [4.280642750854163]
Training neural networks for neuromorphic deployment is non-trivial.
We introduce a Bayesian approach for optimizing the hyper parameters of an algorithm for training binary communication networks that can be deployed to neuromorphic hardware.
We show that by optimizing the hyper parameters on this algorithm for each dataset, we can achieve improvements in accuracy over the previous state-of-the-art for this algorithm on each dataset.
arXiv Detail & Related papers (2020-04-21T01:15:45Z) - Steepest Descent Neural Architecture Optimization: Escaping Local
Optimum with Signed Neural Splitting [60.97465664419395]
We develop a significant and surprising extension of the splitting descent framework that addresses the local optimality issue.
By simply allowing both positive and negative weights during splitting, we can eliminate the appearance of splitting stability in S2D.
We verify our method on various challenging benchmarks such as CIFAR-100, ImageNet and ModelNet40, on which we outperform S2D and other advanced methods on learning accurate and energy-efficient neural networks.
arXiv Detail & Related papers (2020-03-23T17:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.