Related papers: MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks

MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks

URL: http://arxiv.org/abs/2507.16279v1
Date: Tue, 22 Jul 2025 06:50:19 GMT
Title: MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks
Authors: Junhao Su, Feiyu Zhu, Hengyu Shi, Tianyang Han, Yurui Qiu, Junfeng Luo, Xiaoming Wei, Jialin Gao,
Abstract summary: We present the Momentum Auxiliary Network++ (MAN++) for supervised local learning.<n>We show that MAN++ achieves performance comparable to end-to-end training while significantly reducing GPU memory usage.
Score: 10.200277827846076
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning typically relies on end-to-end backpropagation for training, a method that inherently suffers from issues such as update locking during parameter optimization, high GPU memory consumption, and a lack of biological plausibility. In contrast, supervised local learning seeks to mitigate these challenges by partitioning the network into multiple local blocks and designing independent auxiliary networks to update each block separately. However, because gradients are propagated solely within individual local blocks, performance degradation occurs, preventing supervised local learning from supplanting end-to-end backpropagation. To address these limitations and facilitate inter-block information flow, we propose the Momentum Auxiliary Network++ (MAN++). MAN++ introduces a dynamic interaction mechanism by employing the Exponential Moving Average (EMA) of parameters from adjacent blocks to enhance communication across the network. The auxiliary network, updated via EMA, effectively bridges the information gap between blocks. Notably, we observed that directly applying EMA parameters can be suboptimal due to feature discrepancies between local blocks. To resolve this issue, we introduce a learnable scaling bias that balances feature differences, thereby further improving performance. We validate MAN++ through extensive experiments on tasks that include image classification, object detection, and image segmentation, utilizing multiple network architectures. The experimental results demonstrate that MAN++ achieves performance comparable to end-to-end training while significantly reducing GPU memory usage. Consequently, MAN++ offers a novel perspective for supervised local learning and presents a viable alternative to conventional training methods.

Related papers

TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree [52.44403214958304]
In this paper, we introduce TreeLoRA, a novel approach that constructs layer-wise adapters by leveraging hierarchical gradient similarity.<n>To reduce the computational burden of task similarity estimation, we employ bandit techniques to develop an algorithm based on lower confidence bounds.<n> experiments on both vision transformers (ViTs) and large language models (LLMs) demonstrate the effectiveness and efficiency of our approach.
arXiv Detail & Related papers (2025-06-12T05:25:35Z)
Without Paired Labeled Data: End-to-End Self-Supervised Learning for Drone-view Geo-Localization [2.733505168507872]
Drone-view Geo-Localization (DVGL) aims to achieve accurate localization of drones by retrieving the most relevant GPS-tagged satellite images.<n>Existing methods heavily rely on strictly pre-paired drone-satellite images for supervised learning.<n>We propose an end-to-end self-supervised learning method with a shallow backbone network.
arXiv Detail & Related papers (2025-02-17T02:53:08Z)
HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion [7.9514535887836795]
We propose a novel model that performs hierarchical locally supervised learning and patch-level feature on auxiliary networks. We conduct experiments on CIFAR-10, STL-10, SVHN, and ImageNet datasets, and the results demonstrate that our proposed HPFF significantly outperforms previous approaches.
arXiv Detail & Related papers (2024-07-08T06:05:19Z)
Momentum Auxiliary Network for Supervised Local Learning [7.5717621206854275]
Supervised local learning segments the network into multiple local blocks updated by independent auxiliary networks. We propose a Momentum Auxiliary Network (MAN) that establishes a dynamic interaction mechanism. Our method can reduce GPU memory usage by more than 45% on the ImageNet dataset compared to end-to-end training.
arXiv Detail & Related papers (2024-07-08T05:31:51Z)
Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks [9.718519843862937]
We introduce a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize sub-neural networks separately. Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations.
arXiv Detail & Related papers (2023-12-20T08:02:33Z)
CMFDFormer: Transformer-based Copy-Move Forgery Detection with Continual Learning [52.72888626663642]
Copy-move forgery detection aims at detecting duplicated regions in a suspected forged image. Deep learning based copy-move forgery detection methods are in the ascendant. We propose a Transformer-style copy-move forgery network named as CMFDFormer. We also provide a novel PCSD continual learning framework to help CMFDFormer handle new tasks.
arXiv Detail & Related papers (2023-11-22T09:27:46Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy. Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge. We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z)
Block-local learning with probabilistic latent representations [2.839567756494814]
Locking and weight transport are problems because they prevent efficient parallelization and horizontal scaling of the training process. We propose a new method to address both these problems and scale up the training of large models. We present results on a variety of tasks and architectures, demonstrating state-of-the-art performance using block-local learning.
arXiv Detail & Related papers (2023-05-24T10:11:30Z)
Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z)
All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network. Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student. To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z)
LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks. This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.