Related papers: Interlocking Backpropagation: Improving depthwise model-parallelism

Interlocking Backpropagation: Improving depthwise model-parallelism

URL: http://arxiv.org/abs/2010.04116v3
Date: Thu, 7 Jul 2022 23:29:56 GMT
Title: Interlocking Backpropagation: Improving depthwise model-parallelism
Authors: Aidan N. Gomez, Oscar Key, Kuba Perlin, Stephen Gou, Nick Frosst, Jeff Dean, Yarin Gal
Abstract summary: We introduce a class of intermediary strategies between local and global learning. These strategies preserve many of the compute-efficiency advantages of local optimisation. We find that our strategy consistently out-performs local learning in terms of task performance, and out-performs global learning in training efficiency.
Score: 28.97488430121607
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism can suffer from poor resource utilisation, which leads to wasted resources. In this work, we improve upon recent developments in an idealised model-parallel optimisation setting: local learning. Motivated by poor resource utilisation in the global setting and poor task performance in the local setting, we introduce a class of intermediary strategies between local and global learning referred to as interlocking backpropagation. These strategies preserve many of the compute-efficiency advantages of local optimisation, while recovering much of the task performance achieved by global optimisation. We assess our strategies on both image classification ResNets and Transformer language models, finding that our strategy consistently out-performs local learning in terms of task performance, and out-performs global learning in training efficiency.

Related papers

Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [86.99017195607077]
We address real-time sampling and estimation of autoregressive Markovian sources in wireless networks.<n>We propose a graphical reinforcement learning framework for policy optimization.<n>Theoretically, our proposed policies are transferable, allowing a policy trained on one graph to be effectively applied to structurally similar graphs.
arXiv Detail & Related papers (2026-01-19T02:18:45Z)
Stochastic Layer-wise Learning: Scalable and Efficient Alternative to Backpropagation [1.0285749562751982]
Backpropagation underpins modern deep learning, yet its reliance on global synchronization limits scalability and incurs high memory costs.<n>In contrast, fully local learning rules are more efficient but often struggle to maintain the cross-layer coordination needed for coherent global learning.<n>We introduce Layer-wise Learning (SLL), a layer-wise training algorithm that decomposes the global objective into coordinated layer-local updates.
arXiv Detail & Related papers (2025-05-08T12:32:29Z)
Adaptive Global-Local Representation Learning and Selection for Cross-Domain Facial Expression Recognition [54.334773598942775]
Domain shift poses a significant challenge in Cross-Domain Facial Expression Recognition (CD-FER) We propose an Adaptive Global-Local Representation Learning and Selection framework.
arXiv Detail & Related papers (2024-01-20T02:21:41Z)
Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation [4.748931281307333]
We introduce an innovative search mechanism for automatically selecting the best bit-width and layer-width for individual neural network layers. This leads to a marked enhancement in deep neural network efficiency.
arXiv Detail & Related papers (2023-08-12T00:16:51Z)
FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation [32.36334319329364]
Cross-silo federated learning (FL) enables the development of machine learning models on datasets distributed across data centers. Recent research has found that current FL algorithms face a trade-off between local and global performance when confronted with distribution shifts. We propose a novel federated model soup method to optimize the trade-off between local and global performance.
arXiv Detail & Related papers (2023-07-20T00:07:29Z)
Integrating Local Real Data with Global Gradient Prototypes for Classifier Re-Balancing in Federated Long-Tailed Learning [60.41501515192088]
Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively. The data samples usually follow a long-tailed distribution in the real world, and FL on the decentralized and long-tailed data yields a poorly-behaved global model. In this work, we integrate the local real data with the global gradient prototypes to form the local balanced datasets.
arXiv Detail & Related papers (2023-01-25T03:18:10Z)
Tensor Decomposition based Personalized Federated Learning [12.420951968273574]
Federated learning (FL) is a new distributed machine learning framework that can achieve reliably collaborative training without collecting users' private data. Due to FL's frequent communication and average aggregation strategy, they experience challenges scaling to statistical diversity data and large-scale models. We propose a personalized FL framework, named Decomposition based Personalized learning (TDPFed), in which we design a novel tensorized local model with tensorized linear layers and convolutional layers to reduce the communication cost.
arXiv Detail & Related papers (2022-08-27T08:09:14Z)
Locally Supervised Learning with Periodic Global Guidance [19.41730292017383]
We propose Periodically Guided local Learning (PGL) to reinstate the global objective repetitively into the local-loss based training of neural networks. We show that a simple periodic guidance scheme begets significant performance gains while having a low memory footprint.
arXiv Detail & Related papers (2022-08-01T13:06:26Z)
RLFlow: Optimising Neural Network Subgraph Transformation with World Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime. We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Unsupervised Learning for Asynchronous Resource Allocation in Ad-hoc Wireless Networks [122.42812336946756]
We design an unsupervised learning method based on Aggregation Graph Neural Networks (Agg-GNNs) We capture the asynchrony by modeling the activation pattern as a characteristic of each node and train a policy-based resource allocation method.
arXiv Detail & Related papers (2020-11-05T03:38:36Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.