Gradient-Free Training of Quantized Neural Networks
- URL: http://arxiv.org/abs/2410.09734v2
- Date: Mon, 29 Sep 2025 08:57:27 GMT
- Title: Gradient-Free Training of Quantized Neural Networks
- Authors: Noa Cohen, Omkar Joglekar, Dotan Di Castro, Vladimir Tchuiev, Shir Kozlovsky, Michal Moshkovitz,
- Abstract summary: Training neural networks requires significant computational resources and energy.<n>Mixed-precision and quantization-aware training reduce bit usage, yet they still depend heavily on computationally expensive gradient-based optimization.<n>We propose a paradigm shift: eliminate gradients altogether.
- Score: 9.348959582516438
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Training neural networks requires significant computational resources and energy. Methods like mixed-precision and quantization-aware training reduce bit usage, yet they still depend heavily on computationally expensive gradient-based optimization. In this work, we propose a paradigm shift: eliminate gradients altogether. One might hope that, in a finite quantized space, finding optimal weights with out gradients would be easier but we theoretically prove that this problem is NP-hard even in simple settings where the continuous case is efficiently solvable. To address this, we introduce a novel heuristic optimization framework that avoids full weight updates and significantly improves efficiency. Empirically, our method achieves performance comparable to that of full-precision gradient-based training on standard datasets and architectures, while using up to 3x less energy and requiring up to 5x fewer parameter updates.
Related papers
- Training of Spiking Neural Networks with Expectation-Propagation [9.24888258922809]
We propose a unifying message-passing framework for training spiking neural networks (SNNs)<n>Our gradient-free method is capable of learning the marginal distributions of network parameters and simultaneously marginalizes parameters, such as the outputs of hidden layers.
arXiv Detail & Related papers (2025-06-30T11:59:56Z) - Sparks of Quantum Advantage and Rapid Retraining in Machine Learning [0.0]
In this study, we optimize a powerful neural network architecture for representing complex functions with minimal parameters.
We introduce rapid retraining capability, enabling the network to be retrained with new data without reprocessing old samples.
Our findings suggest that with further advancements in quantum hardware and algorithm optimization, quantum-optimized machine learning models could have broad applications.
arXiv Detail & Related papers (2024-07-22T19:55:44Z) - Approximation and Gradient Descent Training with Neural Networks [0.0]
Recent work extends a neural tangent kernel (NTK) optimization argument to an under-parametrized regime.
This paper establishes analogous results for networks trained by gradient descent.
arXiv Detail & Related papers (2024-05-19T23:04:09Z) - Gradient-free neural topology optimization [0.0]
gradient-free algorithms require many more iterations to converge when compared to gradient-based algorithms.
This has made them unviable for topology optimization due to the high computational cost per iteration and high dimensionality of these problems.
We propose a pre-trained neural reparameterization strategy that leads to at least one order of magnitude decrease in iteration count when optimizing the designs in latent space.
arXiv Detail & Related papers (2024-03-07T23:00:49Z) - Gradual Optimization Learning for Conformational Energy Minimization [69.36925478047682]
Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks significantly reduces the required additional data.
Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules.
arXiv Detail & Related papers (2023-11-05T11:48:08Z) - Efficient Neural PDE-Solvers using Quantization Aware Training [71.0934372968972]
We show that quantization can successfully lower the computational cost of inference while maintaining performance.
Our results on four standard PDE datasets and three network architectures show that quantization-aware training works across settings and three orders of FLOPs magnitudes.
arXiv Detail & Related papers (2023-08-14T09:21:19Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning [0.0]
Machine learning algorithms heavily rely on optimization algorithms based on gradients, such as gradient descent and alike.
The overall performance is dependent on the appearance of local minima and barren plateaus, which slow-down calculations and lead to non-optimal solutions.
In this paper we introduce a generic strategy to accelerate and improve the overall performance of such methods, allowing to alleviate the effect of barren plateaus and local minima.
arXiv Detail & Related papers (2023-04-13T18:26:05Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Combinatorial optimization for low bit-width neural networks [23.466606660363016]
Low-bit width neural networks have been extensively explored for deployment on edge devices to reduce computational resources.
Existing approaches have focused on gradient-based optimization in a two-stage train-and-compress setting.
We show that a combination of greedy coordinate descent and this novel approach can attain competitive accuracy on binary classification tasks.
arXiv Detail & Related papers (2022-06-04T15:02:36Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - neos: End-to-End-Optimised Summary Statistics for High Energy Physics [0.0]
Deep learning has yielded powerful tools to automatically compute gradients of computations.
This is because training a neural network equates to iteratively updating its parameters using gradient descent to find the minimum of a loss function.
Deep learning is then a subset of a broader paradigm; a workflow with free parameters that is end-to-end optimisable.
arXiv Detail & Related papers (2022-03-10T14:08:05Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Efficient Neural Network Training via Forward and Backward Propagation
Sparsification [26.301103403328312]
We propose an efficient sparse training method with completely sparse forward and backward passes.
Our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.
arXiv Detail & Related papers (2021-11-10T13:49:47Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Universality of Gradient Descent Neural Network Training [0.0]
We discuss the question if it is always possible to redesign a neural network so that it trains well with gradient descent.
The construction is not intended for practical computations, but it provides some orientation on the possibilities of meta-learning and related approaches.
arXiv Detail & Related papers (2020-07-27T16:17:19Z) - Training highly effective connectivities within neural networks with
randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights.
We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.