MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network
- URL: http://arxiv.org/abs/2406.16633v4
- Date: Thu, 15 Aug 2024 11:59:15 GMT
- Title: MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network
- Authors: Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Junhao Su, Jiabin Liu, Changpeng Cai,
- Abstract summary: We propose the Multilaminar Leap Augmented Auxiliary Network (MLAAN)
MLAAN captures both local and global features through independent and cascaded auxiliary networks.
We further design LAM, an enhanced auxiliary network that uses the Exponential Moving Average (EMA) method to facilitate information exchange between local modules.
Our experiments on the CIFAR-10, STL-10, SVHN, and ImageNet datasets show that MLAAN can be seamlessly integrated into existing local learning frameworks.
- Score: 4.396837128416218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) typically employ an end-to-end (E2E) training paradigm which presents several challenges, including high GPU memory consumption, inefficiency, and difficulties in model parallelization during training. Recent research has sought to address these issues, with one promising approach being local learning. This method involves partitioning the backbone network into gradient-isolated modules and manually designing auxiliary networks to train these local modules. Existing methods often neglect the interaction of information between local modules, leading to myopic issues and a performance gap compared to E2E training. To address these limitations, we propose the Multilaminar Leap Augmented Auxiliary Network (MLAAN). Specifically, MLAAN comprises Multilaminar Local Modules (MLM) and Leap Augmented Modules (LAM). MLM captures both local and global features through independent and cascaded auxiliary networks, alleviating performance issues caused by insufficient global features. However, overly simplistic auxiliary networks can impede MLM's ability to capture global information. To address this, we further design LAM, an enhanced auxiliary network that uses the Exponential Moving Average (EMA) method to facilitate information exchange between local modules, thereby mitigating the shortsightedness resulting from inadequate interaction. The synergy between MLM and LAM has demonstrated excellent performance. Our experiments on the CIFAR-10, STL-10, SVHN, and ImageNet datasets show that MLAAN can be seamlessly integrated into existing local learning frameworks, significantly enhancing their performance and even surpassing end-to-end (E2E) training methods, while also reducing GPU memory consumption.
Related papers
- Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models.
MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z) - CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training [15.462969044840868]
We introduce LW-FedMML, a layer-wise federated multimodal learning approach which decomposes the training process into multiple stages.
We conduct extensive experiments across various FL and multimodal learning settings to validate the effectiveness of our proposed method.
Specifically, LW-FedMML reduces memory usage by up to $2.7times$, computational operations (FLOPs) by $2.4times$, and total communication cost by $2.3times$.
arXiv Detail & Related papers (2024-07-22T07:06:17Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)
A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.
This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding.
A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion [7.9514535887836795]
We propose a novel model that performs hierarchical locally supervised learning and patch-level feature on auxiliary networks.
We conduct experiments on CIFAR-10, STL-10, SVHN, and ImageNet datasets, and the results demonstrate that our proposed HPFF significantly outperforms previous approaches.
arXiv Detail & Related papers (2024-07-08T06:05:19Z) - Personalized Wireless Federated Learning for Large Language Models [75.22457544349668]
Large Language Models (LLMs) have revolutionized natural language processing tasks.
Their deployment in wireless networks still face challenges, i.e., a lack of privacy and security protection mechanisms.
We introduce two personalized wireless federated fine-tuning methods with low communication overhead.
arXiv Detail & Related papers (2024-04-20T02:30:21Z) - LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models
via MoE-Style Plugin [85.16356890023582]
We propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network.
It freezes the backbone model and forces a portion of LoRAs to focus on leveraging world knowledge to solve downstream tasks.
Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks.
arXiv Detail & Related papers (2023-12-15T17:45:06Z) - Local Learning with Neuron Groups [15.578925277062657]
Local learning is an approach to model-parallelism that removes the standard end-to-end learning setup.
We study how local learning can be applied at the level of splitting layers or modules into sub-components.
arXiv Detail & Related papers (2023-01-18T16:25:10Z) - Locally Supervised Learning with Periodic Global Guidance [19.41730292017383]
We propose Periodically Guided local Learning (PGL) to reinstate the global objective repetitively into the local-loss based training of neural networks.
We show that a simple periodic guidance scheme begets significant performance gains while having a low memory footprint.
arXiv Detail & Related papers (2022-08-01T13:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.