Related papers: Block-local learning with probabilistic latent representations

Block-local learning with probabilistic latent representations

URL: http://arxiv.org/abs/2305.14974v2
Date: Fri, 27 Oct 2023 19:02:37 GMT
Title: Block-local learning with probabilistic latent representations
Authors: David Kappel, Khaleelulla Khan Nazeer, Cabrel Teguemne Fokam, Christian Mayr, Anand Subramoney
Abstract summary: Locking and weight transport are problems because they prevent efficient parallelization and horizontal scaling of the training process. We propose a new method to address both these problems and scale up the training of large models. We present results on a variety of tasks and architectures, demonstrating state-of-the-art performance using block-local learning.
Score: 2.839567756494814
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ubiquitous backpropagation algorithm requires sequential updates through the network introducing a locking problem. In addition, back-propagation relies on the transpose of forward weight matrices to compute updates, introducing a weight transport problem across the network. Locking and weight transport are problems because they prevent efficient parallelization and horizontal scaling of the training process. We propose a new method to address both these problems and scale up the training of large models. Our method works by dividing a deep neural network into blocks and introduces a feedback network that propagates the information from the targets backwards to provide auxiliary local losses. Forward and backward propagation can operate in parallel and with different sets of weights, addressing the problems of locking and weight transport. Our approach derives from a statistical interpretation of training that treats output activations of network blocks as parameters of probability distributions. The resulting learning framework uses these parameters to evaluate the agreement between forward and backward information. Error backpropagation is then performed locally within each block, leading to "block-local" learning. Several previously proposed alternatives to error backpropagation emerge as special cases of our model. We present results on a variety of tasks and architectures, demonstrating state-of-the-art performance using block-local learning. These results provide a new principled framework for training networks in a distributed setting.

Related papers

MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks [10.200277827846076]
We present the Momentum Auxiliary Network++ (MAN++) for supervised local learning.<n>We show that MAN++ achieves performance comparable to end-to-end training while significantly reducing GPU memory usage.
arXiv Detail & Related papers (2025-07-22T06:50:19Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Momentum Auxiliary Network for Supervised Local Learning [7.5717621206854275]
Supervised local learning segments the network into multiple local blocks updated by independent auxiliary networks. We propose a Momentum Auxiliary Network (MAN) that establishes a dynamic interaction mechanism. Our method can reduce GPU memory usage by more than 45% on the ImageNet dataset compared to end-to-end training.
arXiv Detail & Related papers (2024-07-08T05:31:51Z)
Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents. Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z)
Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks [9.718519843862937]
We introduce a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize sub-neural networks separately. Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations.
arXiv Detail & Related papers (2023-12-20T08:02:33Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge. Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net) IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Latent Iterative Refinement for Modular Source Separation [44.78689915209527]
Traditional source separation approaches train deep neural network models end-to-end with all the data available at once. We argue that we can significantly increase resource efficiency during both training and inference stages.
arXiv Detail & Related papers (2022-11-22T00:02:57Z)
Block-wise Training of Residual Networks via the Minimizing Movement Scheme [10.342408668490975]
We develop a layer-wise training method, particularly well to ResNets, inspired by the minimizing movement scheme for gradient flows in distribution space. The method amounts to a kinetic energy regularization of each block that makes the blocks optimal transport maps and endows them with regularity. It works by alleviating the stagnation problem observed in layer-wise training, whereby greedily-trained early layers overfit and deeper layers stop increasing test accuracy after a certain depth.
arXiv Detail & Related papers (2022-10-03T14:03:56Z)
Transfer Learning via Test-Time Neural Networks Aggregation [11.42582922543676]
It has been demonstrated that deep neural networks outperform traditional machine learning. Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
arXiv Detail & Related papers (2022-06-27T15:46:05Z)
Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations [143.3053365553897]
We describe a procedure for removing dependency on a cohort of training data from a trained deep network. We introduce a new bound on how much information can be extracted per query about the forgotten cohort. We exploit the connections between the activation and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the activations.
arXiv Detail & Related papers (2020-03-05T23:17:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.