Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed)
Neural Networks
- URL: http://arxiv.org/abs/2308.09858v2
- Date: Mon, 9 Oct 2023 18:00:00 GMT
- Title: Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed)
Neural Networks
- Authors: Yequan Zhao, Xinling Yu, Zhixiong Chen, Ziyue Liu, Sijia Liu and Zheng
Zhang
- Abstract summary: Backward propagation (BP) is widely used to compute the gradients in neural network training.
It is hard to implement BP on edge devices due to the lack of hardware and software resources to support automatic differentiation.
This paper presents a completely BP-free framework that only requires forward propagation to train realistic neural networks.
- Score: 15.188785164091987
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Backward propagation (BP) is widely used to compute the gradients in neural
network training. However, it is hard to implement BP on edge devices due to
the lack of hardware and software resources to support automatic
differentiation. This has tremendously increased the design complexity and
time-to-market of on-device training accelerators. This paper presents a
completely BP-free framework that only requires forward propagation to train
realistic neural networks. Our technical contributions are three-fold. Firstly,
we present a tensor-compressed variance reduction approach to greatly improve
the scalability of zeroth-order (ZO) optimization, making it feasible to handle
a network size that is beyond the capability of previous ZO approaches.
Secondly, we present a hybrid gradient evaluation approach to improve the
efficiency of ZO training. Finally, we extend our BP-free training framework to
physics-informed neural networks (PINNs) by proposing a sparse-grid approach to
estimate the derivatives in the loss function without using BP. Our BP-free
training only loses little accuracy on the MNIST dataset compared with standard
first-order training. We also demonstrate successful results in training a PINN
for solving a 20-dim Hamiltonian-Jacobi-Bellman PDE. This memory-efficient and
BP-free approach may serve as a foundation for the near-future on-device
training on many resource-constraint platforms (e.g., FPGA, ASIC,
micro-controllers, and photonic chips).
Related papers
- Asymmetrical estimator for training encapsulated deep photonic neural networks [10.709758849326061]
asymmetrical training (AT) is a BP-based training method that can perform training on an encapsulated deep network.
AT offers significantly improved time and energy efficiency compared to existing BP-PNN methods.
We demonstrate AT's error-tolerant and calibration-free training for encapsulated integrated photonic deep networks.
arXiv Detail & Related papers (2024-05-28T17:27:20Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - One Forward is Enough for Neural Network Training via Likelihood Ratio
Method [47.013384887197454]
Backpropagation (BP) is the mainstream approach for gradient computation in neural network training.
We develop a unified likelihood ratio (ULR) method for estimation with just one forward propagation.
arXiv Detail & Related papers (2023-05-15T19:02:46Z) - Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency.
We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training.
We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - DNN Training Acceleration via Exploring GPGPU Friendly Sparsity [16.406482603838157]
We propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and online generated row-based or tile-based dropout patterns.
We then develop a SGD-based Search Algorithm that produces the distribution of row-based or tile-based dropout patterns to compensate for the potential accuracy loss.
We also propose the sensitivity-aware dropout method to dynamically drop the input feature maps based on their sensitivity so as to achieve greater forward and backward training acceleration.
arXiv Detail & Related papers (2022-03-11T01:32:03Z) - Enabling Incremental Training with Forward Pass for Edge Devices [0.0]
We introduce a method using evolutionary strategy (ES) that can partially retrain the network enabling it to adapt to changes and recover after an error has occurred.
This technique enables training on an inference-only hardware without the need to use backpropagation and with minimal resource overhead.
arXiv Detail & Related papers (2021-03-25T17:43:04Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.