Gradient-Free Neural Network Training on the Edge
- URL: http://arxiv.org/abs/2410.09734v1
- Date: Sun, 13 Oct 2024 05:38:39 GMT
- Title: Gradient-Free Neural Network Training on the Edge
- Authors: Dotan Di Castro, Omkar Joglekar, Shir Kozlovsky, Vladimir Tchuiev, Michal Moshkovitz,
- Abstract summary: Training neural networks is computationally heavy and energy-intensive.
This work presents a novel technique for training neural networks without needing gradient.
We show that it is possible to train models without gradient-based optimization techniques by identifying erroneous contributions of each neuron towards the expected classification.
- Score: 12.472204825917629
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Training neural networks is computationally heavy and energy-intensive. Many methodologies were developed to save computational requirements and energy by reducing the precision of network weights at inference time and introducing techniques such as rounding, stochastic rounding, and quantization. However, most of these techniques still require full gradient precision at training time, which makes training such models prohibitive on edge devices. This work presents a novel technique for training neural networks without needing gradients. This enables a training process where all the weights are one or two bits, without any hidden full precision computations. We show that it is possible to train models without gradient-based optimization techniques by identifying erroneous contributions of each neuron towards the expected classification and flipping the relevant bits using logical operations. We tested our method on several standard datasets and achieved performance comparable to corresponding gradient-based baselines with a fraction of the compute power.
Related papers
- Training of Spiking Neural Networks with Expectation-Propagation [9.24888258922809]
We propose a unifying message-passing framework for training spiking neural networks (SNNs)<n>Our gradient-free method is capable of learning the marginal distributions of network parameters and simultaneously marginalizes parameters, such as the outputs of hidden layers.
arXiv Detail & Related papers (2025-06-30T11:59:56Z) - Sparks of Quantum Advantage and Rapid Retraining in Machine Learning [0.0]
In this study, we optimize a powerful neural network architecture for representing complex functions with minimal parameters.
We introduce rapid retraining capability, enabling the network to be retrained with new data without reprocessing old samples.
Our findings suggest that with further advancements in quantum hardware and algorithm optimization, quantum-optimized machine learning models could have broad applications.
arXiv Detail & Related papers (2024-07-22T19:55:44Z) - Approximation and Gradient Descent Training with Neural Networks [0.0]
Recent work extends a neural tangent kernel (NTK) optimization argument to an under-parametrized regime.
This paper establishes analogous results for networks trained by gradient descent.
arXiv Detail & Related papers (2024-05-19T23:04:09Z) - Gradient-free neural topology optimization [0.0]
gradient-free algorithms require many more iterations to converge when compared to gradient-based algorithms.
This has made them unviable for topology optimization due to the high computational cost per iteration and high dimensionality of these problems.
We propose a pre-trained neural reparameterization strategy that leads to at least one order of magnitude decrease in iteration count when optimizing the designs in latent space.
arXiv Detail & Related papers (2024-03-07T23:00:49Z) - Gradual Optimization Learning for Conformational Energy Minimization [69.36925478047682]
Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks significantly reduces the required additional data.
Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules.
arXiv Detail & Related papers (2023-11-05T11:48:08Z) - Efficient Neural PDE-Solvers using Quantization Aware Training [71.0934372968972]
We show that quantization can successfully lower the computational cost of inference while maintaining performance.
Our results on four standard PDE datasets and three network architectures show that quantization-aware training works across settings and three orders of FLOPs magnitudes.
arXiv Detail & Related papers (2023-08-14T09:21:19Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning [0.0]
Machine learning algorithms heavily rely on optimization algorithms based on gradients, such as gradient descent and alike.
The overall performance is dependent on the appearance of local minima and barren plateaus, which slow-down calculations and lead to non-optimal solutions.
In this paper we introduce a generic strategy to accelerate and improve the overall performance of such methods, allowing to alleviate the effect of barren plateaus and local minima.
arXiv Detail & Related papers (2023-04-13T18:26:05Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Combinatorial optimization for low bit-width neural networks [23.466606660363016]
Low-bit width neural networks have been extensively explored for deployment on edge devices to reduce computational resources.
Existing approaches have focused on gradient-based optimization in a two-stage train-and-compress setting.
We show that a combination of greedy coordinate descent and this novel approach can attain competitive accuracy on binary classification tasks.
arXiv Detail & Related papers (2022-06-04T15:02:36Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - neos: End-to-End-Optimised Summary Statistics for High Energy Physics [0.0]
Deep learning has yielded powerful tools to automatically compute gradients of computations.
This is because training a neural network equates to iteratively updating its parameters using gradient descent to find the minimum of a loss function.
Deep learning is then a subset of a broader paradigm; a workflow with free parameters that is end-to-end optimisable.
arXiv Detail & Related papers (2022-03-10T14:08:05Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Efficient Neural Network Training via Forward and Backward Propagation
Sparsification [26.301103403328312]
We propose an efficient sparse training method with completely sparse forward and backward passes.
Our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.
arXiv Detail & Related papers (2021-11-10T13:49:47Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Universality of Gradient Descent Neural Network Training [0.0]
We discuss the question if it is always possible to redesign a neural network so that it trains well with gradient descent.
The construction is not intended for practical computations, but it provides some orientation on the possibilities of meta-learning and related approaches.
arXiv Detail & Related papers (2020-07-27T16:17:19Z) - Training highly effective connectivities within neural networks with
randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights.
We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.