Related papers: Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed) Neural Networks

Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed) Neural Networks

URL: http://arxiv.org/abs/2308.09858v2
Date: Mon, 9 Oct 2023 18:00:00 GMT
Title: Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed) Neural Networks
Authors: Yequan Zhao, Xinling Yu, Zhixiong Chen, Ziyue Liu, Sijia Liu and Zheng Zhang
Abstract summary: Backward propagation (BP) is widely used to compute the gradients in neural network training. It is hard to implement BP on edge devices due to the lack of hardware and software resources to support automatic differentiation. This paper presents a completely BP-free framework that only requires forward propagation to train realistic neural networks.
Score: 15.188785164091987
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Backward propagation (BP) is widely used to compute the gradients in neural network training. However, it is hard to implement BP on edge devices due to the lack of hardware and software resources to support automatic differentiation. This has tremendously increased the design complexity and time-to-market of on-device training accelerators. This paper presents a completely BP-free framework that only requires forward propagation to train realistic neural networks. Our technical contributions are three-fold. Firstly, we present a tensor-compressed variance reduction approach to greatly improve the scalability of zeroth-order (ZO) optimization, making it feasible to handle a network size that is beyond the capability of previous ZO approaches. Secondly, we present a hybrid gradient evaluation approach to improve the efficiency of ZO training. Finally, we extend our BP-free training framework to physics-informed neural networks (PINNs) by proposing a sparse-grid approach to estimate the derivatives in the loss function without using BP. Our BP-free training only loses little accuracy on the MNIST dataset compared with standard first-order training. We also demonstrate successful results in training a PINN for solving a 20-dim Hamiltonian-Jacobi-Bellman PDE. This memory-efficient and BP-free approach may serve as a foundation for the near-future on-device training on many resource-constraint platforms (e.g., FPGA, ASIC, micro-controllers, and photonic chips).

Related papers

Scalable Back-Propagation-Free Training of Optical Physics-Informed Neural Networks [12.726911225088443]
Physics-informed neural networks (PINNs) have shown promise in solving partial differential equations (PDEs) Photonic computing offers a potential solution to achieve this goal because of its ultra-high operation speed. This paper proposes a completely back-propagation-free (BP-free) and highly salable framework for training real-size PINNs on silicon photonic platforms.
arXiv Detail & Related papers (2025-02-17T23:45:23Z)
Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach [9.199493064055586]
Back propagation (BP) is the default solution for gradient computation in neural network training. implementing BP-based training on various edge devices such as FPGA, microcontrollers (MCUs) and analog computing platforms face multiple challenges. This paper presents a simple BP-free training scheme on an MCU, which makes edge training hardware design as easy as inference hardware design.
arXiv Detail & Related papers (2024-11-07T22:42:57Z)
Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network. We provide analytical expressions for these speed limits for linear and linearizable neural networks. Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z)
One Forward is Enough for Neural Network Training via Likelihood Ratio Method [47.013384887197454]
Backpropagation (BP) is the mainstream approach for gradient computation in neural network training. We develop a unified likelihood ratio (ULR) method for estimation with just one forward propagation.
arXiv Detail & Related papers (2023-05-15T19:02:46Z)
Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency. We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training. We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z)
DNN Training Acceleration via Exploring GPGPU Friendly Sparsity [16.406482603838157]
We propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and online generated row-based or tile-based dropout patterns. We then develop a SGD-based Search Algorithm that produces the distribution of row-based or tile-based dropout patterns to compensate for the potential accuracy loss. We also propose the sensitivity-aware dropout method to dynamically drop the input feature maps based on their sensitivity so as to achieve greater forward and backward training acceleration.
arXiv Detail & Related papers (2022-03-11T01:32:03Z)
Enabling Incremental Training with Forward Pass for Edge Devices [0.0]
We introduce a method using evolutionary strategy (ES) that can partially retrain the network enabling it to adapt to changes and recover after an error has occurred. This technique enables training on an inference-only hardware without the need to use backpropagation and with minimal resource overhead.
arXiv Detail & Related papers (2021-03-25T17:43:04Z)
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.