Related papers: SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model Parallelism

SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model Parallelism

URL: http://arxiv.org/abs/2602.00062v2
Date: Tue, 03 Feb 2026 04:12:03 GMT
Title: SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model Parallelism
Authors: Ming-Yao Ho, Cheng-Kai Wang, You-Teng Lin, Hung-Hsuan Chen,
Abstract summary: This paper introduces a new training methodology, Supervised Contrastive Parallel Learning ( SCPL), that addresses this issue by decoupling BP and transforming a long gradient flow into multiple short ones.<n> Detailed experiments are presented to demonstrate the efficiency and effectiveness of our model compared to BP, Early Exit, GPipe, and Associated Learning (AL), a state-of-the-art method for decoupling backpropagation.<n> SCPL provides a practical pathway for organizations to develop and deploy advanced information systems more cost-effectively and with greater agility.
Score: 2.4349098308669594
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adopting large-scale AI models in enterprise information systems is often hindered by high training costs and long development cycles, posing a significant managerial challenge. The standard end-to-end backpropagation (BP) algorithm is a primary driver of modern AI, but it is also the source of inefficiency in training deep networks. This paper introduces a new training methodology, Supervised Contrastive Parallel Learning (SCPL), that addresses this issue by decoupling BP and transforming a long gradient flow into multiple short ones. This design enables the simultaneous computation of parameter gradients in different layers, achieving superior model parallelism and enhancing training throughput. Detailed experiments are presented to demonstrate the efficiency and effectiveness of our model compared to BP, Early Exit, GPipe, and Associated Learning (AL), a state-of-the-art method for decoupling backpropagation. By mitigating a fundamental performance bottleneck, SCPL provides a practical pathway for organizations to develop and deploy advanced information systems more cost-effectively and with greater agility. The experimental code is released for reproducibility. https://github.com/minyaho/scpl/

Related papers

PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation [47.510888611491]
Reinforcement Learning (RL) is increasingly utilized to enhance the reasoning capabilities of Large Language Models (LLMs)<n>This paper introduces PipelineRL, an approach designed to achieve a superior trade-off between hardware efficiency and data on-policyness.
arXiv Detail & Related papers (2025-09-23T15:15:21Z)
DeInfoReg: A Decoupled Learning Framework for Better Training Throughput [1.8434042562191815]
This paper introduces Decoupled Supervised Learning with Information Regularization (DeInfoReg)<n>It transforms a long gradient flow into multiple shorter ones, thereby mitigating the vanishing gradient problem.<n>We compare our proposed method with standard backpropagation and other gradient flow decomposition techniques.
arXiv Detail & Related papers (2025-06-22T22:50:06Z)
LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.<n>Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.<n>We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
GreenLightningAI: An Efficient AI System with Decoupled Structural and Quantitative Knowledge [0.0]
Training powerful and popular deep neural networks comes at very high economic and environmental costs. This work takes a radically different approach by proposing GreenLightningAI. The new AI system stores the information required to select the system subset for a given sample. We show experimentally that the structural information can be kept unmodified when re-training the AI system with new samples.
arXiv Detail & Related papers (2023-12-15T17:34:11Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks [44.37047471448793]
In this paper, we advocate the integration of edge computing paradigm and parallel split learning (PSL) We propose an innovative PSL framework, namely, efficient parallel split learning (EPSL) to accelerate model training. We show that the proposed EPSL framework significantly decreases the training latency needed to achieve a target accuracy.
arXiv Detail & Related papers (2023-03-26T16:09:48Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x. We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z)
Consistency Training of Multi-exit Architectures for Sensor Data [0.07614628596146598]
We present a novel and architecture-agnostic approach for robust training of multi-exit architectures termed consistent exit training. We leverage weak supervision to align model output with consistency training and jointly optimize dual-losses in a multi-task learning fashion over the exits in a network.
arXiv Detail & Related papers (2021-09-27T17:11:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.