Improving Neural Network Training in Low Dimensional Random Bases
- URL: http://arxiv.org/abs/2011.04720v1
- Date: Mon, 9 Nov 2020 19:50:19 GMT
- Title: Improving Neural Network Training in Low Dimensional Random Bases
- Authors: Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi
- Abstract summary: We show that keeping the random projection fixed throughout training is detrimental to optimization.
We propose re-drawing the random subspace at each step, which yields significantly better performance.
We realize further improvements by applying independent projections to different parts of the network, making the approximation more efficient as network dimensionality grows.
- Score: 5.156484100374058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic Gradient Descent (SGD) has proven to be remarkably effective in
optimizing deep neural networks that employ ever-larger numbers of parameters.
Yet, improving the efficiency of large-scale optimization remains a vital and
highly active area of research. Recent work has shown that deep neural networks
can be optimized in randomly-projected subspaces of much smaller dimensionality
than their native parameter space. While such training is promising for more
efficient and scalable optimization schemes, its practical application is
limited by inferior optimization performance. Here, we improve on recent random
subspace approaches as follows: Firstly, we show that keeping the random
projection fixed throughout training is detrimental to optimization. We propose
re-drawing the random subspace at each step, which yields significantly better
performance. We realize further improvements by applying independent
projections to different parts of the network, making the approximation more
efficient as network dimensionality grows. To implement these experiments, we
leverage hardware-accelerated pseudo-random number generation to construct the
random projections on-demand at every optimization step, allowing us to
distribute the computation of independent random directions across multiple
workers with shared random seeds. This yields significant reductions in memory
and is up to 10 times faster for the workloads in question.
Related papers
- Convergence and scaling of Boolean-weight optimization for hardware
reservoirs [0.0]
We analytically derive the scaling laws for highly efficient Coordinate Descent applied to optimize the readout layer of a random recurrently connection neural network.
Our results perfectly reproduce the convergence and scaling of a large-scale photonic reservoir implemented in a proof-of-concept experiment.
arXiv Detail & Related papers (2023-05-13T12:15:25Z) - Transformer-Based Learned Optimization [37.84626515073609]
We propose a new approach to learned optimization where we represent the computation's update step using a neural network.
Our innovation is a new neural network architecture inspired by the classic BFGS algorithm.
We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for the evaluation of optimization algorithms.
arXiv Detail & Related papers (2022-12-02T09:47:08Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs [0.0]
Training deep neural networks consumes increasing computational resource shares in many compute centers.
We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only.
We compare the proposed second-order method with two state-of-the-arts on five representative neural network problems.
arXiv Detail & Related papers (2022-08-03T12:38:23Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Delta-STN: Efficient Bilevel Optimization for Neural Networks using
Structured Response Jacobians [5.33024001730262]
Self-Tuning Networks (STNs) have recently gained traction due to their ability to amortize the optimization of the inner objective.
We propose the $Delta$-STN, an improved hypernetwork architecture which stabilizes training.
arXiv Detail & Related papers (2020-10-26T12:12:23Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Steepest Descent Neural Architecture Optimization: Escaping Local
Optimum with Signed Neural Splitting [60.97465664419395]
We develop a significant and surprising extension of the splitting descent framework that addresses the local optimality issue.
By simply allowing both positive and negative weights during splitting, we can eliminate the appearance of splitting stability in S2D.
We verify our method on various challenging benchmarks such as CIFAR-100, ImageNet and ModelNet40, on which we outperform S2D and other advanced methods on learning accurate and energy-efficient neural networks.
arXiv Detail & Related papers (2020-03-23T17:09:27Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.