Related papers: Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply

Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply

URL: http://arxiv.org/abs/2312.07636v1
Date: Tue, 12 Dec 2023 10:25:31 GMT
Title: Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply
Authors: Chengting Yu, Fengzhao Zhang, Hanzhi Ma, Aili Wang and Erping Li
Abstract summary: greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses. As the number of segmentations of the gradient-isolated module increases, the performance of the local learning scheme degrades substantially. We propose a ContSup scheme, which incorporates context supply between isolated modules to compensate for information loss.
Score: 0.12187048691454236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and parallel training methods that substantially reduce memory cost. However, empirical experiments reveal that as the number of segmentations of the gradient-isolated module increases, the performance of the local learning scheme degrades substantially, severely limiting its expansibility. To avoid this issue, we theoretically analyze the greedy local learning from the standpoint of information theory and propose a ContSup scheme, which incorporates context supply between isolated modules to compensate for information loss. Experiments on benchmark datasets (i.e. CIFAR, SVHN, STL-10) achieve SOTA results and indicate that our proposed method can significantly improve the performance of greedy local learning with minimal memory and computational overhead, allowing for the boost of the number of isolated modules. Our codes are available at https://github.com/Tab-ct/ContSup.

Related papers

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion [7.9514535887836795]
We propose a novel model that performs hierarchical locally supervised learning and patch-level feature on auxiliary networks. We conduct experiments on CIFAR-10, STL-10, SVHN, and ImageNet datasets, and the results demonstrate that our proposed HPFF significantly outperforms previous approaches.
arXiv Detail & Related papers (2024-07-08T06:05:19Z)
Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation [70.43845294145714]
Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic. We propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules. Our method can be integrated into both local-BP and BP-free settings.
arXiv Detail & Related papers (2024-06-07T19:10:31Z)
Semi-Federated Learning: Convergence Analysis and Optimization of A Hybrid Learning Framework [70.83511997272457]
We propose a semi-federated learning (SemiFL) paradigm to leverage both the base station (BS) and devices for a hybrid implementation of centralized learning (CL) and FL. We propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers.
arXiv Detail & Related papers (2023-10-04T03:32:39Z)
Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency. We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training. We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z)
Locally Supervised Learning with Periodic Global Guidance [19.41730292017383]
We propose Periodically Guided local Learning (PGL) to reinstate the global objective repetitively into the local-loss based training of neural networks. We show that a simple periodic guidance scheme begets significant performance gains while having a low memory footprint.
arXiv Detail & Related papers (2022-08-01T13:06:26Z)
BackLink: Supervised Local Training with Backward Links [2.104758015212034]
This work proposes a novel local training algorithm, BackLink, which introduces inter- module backward dependency and allows errors to flow between modules. Our method can lead up to a 79% reduction in memory cost and 52% in simulation runtime in ResNet110 compared to the standard BP.
arXiv Detail & Related papers (2022-05-14T21:49:47Z)
Acceleration of Federated Learning with Alleviated Forgetting in Local Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy. We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage. Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Revisiting Locally Supervised Learning: an Alternative to End-to-end Training [36.43515074019875]
We propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible. We show that InfoPro is capable of achieving competitive performance with less than 40% memory footprint compared to E2E training.
arXiv Detail & Related papers (2021-01-26T15:02:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.