Related papers: MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

URL: http://arxiv.org/abs/2406.16633v1
Date: Mon, 24 Jun 2024 13:30:55 GMT
Title: MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network
Authors: Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Jiabin Liu, Changpeng Cai,
Abstract summary: Local learning is considered a novel interactive training method that holds promise as an alternative to E2E. Conventional local learning methods fall short in achieving high model accuracy due to inadequate local inter- module interactions. We introduce a new model known as the Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network (MLAAN)
Score: 4.586209809964039
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end (E2E) training approaches are commonly plagued by high memory consumption, reduced efficiency in training, challenges in model parallelization, and suboptimal biocompatibility. Local learning is considered a novel interactive training method that holds promise as an alternative to E2E. Nonetheless, conventional local learning methods fall short in achieving high model accuracy due to inadequate local inter-module interactions. In this paper, we introduce a new model known as the Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network (MLAAN). MLAAN features an innovative supervised local learning approach coupled with a robust reinforcement module. This dual-component design enables the MLAAN to integrate smoothly with established local learning techniques, thereby enhancing the efficacy of the foundational methods. The method simultaneously acquires the local and global features of the model separately by constructing an independent auxiliary network and a cascade auxiliary network on the one hand and incorporates a leap augmented module, which serves to counteract the reduced learning capacity often associated with weaker supervision. This architecture not only augments the exchange of information amongst the local modules but also effectively mitigates the model's tendency toward myopia. The experimental evaluations conducted on four benchmark datasets, CIFAR-10, STL-10, SVHN, and ImageNet, demonstrate that the integration of MLAAN with existing supervised local learning methods significantly enhances the original methodologies. Of particular note, MLAAN enables local learning methods to comprehensively outperform end-to-end training approaches in terms of optimal performance while saving GPU memory.

Related papers

MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks [10.200277827846076]
We present the Momentum Auxiliary Network++ (MAN++) for supervised local learning.<n>We show that MAN++ achieves performance comparable to end-to-end training while significantly reducing GPU memory usage.
arXiv Detail & Related papers (2025-07-22T06:50:19Z)
PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning [54.73049408950049]
We propose a Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning.<n>Our approach improves unified multimodal retrieval from both structural and learning perspectives.
arXiv Detail & Related papers (2025-07-10T16:47:25Z)
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering [27.812611421754482]
We propose an MLLMs-based dual momentum Mixture-of-Experts (CL-MoE) framework for continual visual question answering (VQA) We integrate MLLMs with continual learning to utilize the rich commonsense knowledge in LLMs. Our method achieves state-of-the-art performance on 10 VQA tasks, proving the effectiveness of our approach.
arXiv Detail & Related papers (2025-03-01T09:25:23Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training [15.462969044840868]
We introduce LW-FedMML, a layer-wise federated multimodal learning approach which decomposes the training process into multiple stages. We conduct extensive experiments across various FL and multimodal learning settings to validate the effectiveness of our proposed method. Specifically, LW-FedMML reduces memory usage by up to $2.7times$, computational operations (FLOPs) by $2.4times$, and total communication cost by $2.3times$.
arXiv Detail & Related papers (2024-07-22T07:06:17Z)
R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML) A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming. This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding. A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z)
HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion [7.9514535887836795]
We propose a novel model that performs hierarchical locally supervised learning and patch-level feature on auxiliary networks. We conduct experiments on CIFAR-10, STL-10, SVHN, and ImageNet datasets, and the results demonstrate that our proposed HPFF significantly outperforms previous approaches.
arXiv Detail & Related papers (2024-07-08T06:05:19Z)
Personalized Wireless Federated Learning for Large Language Models [75.22457544349668]
Large Language Models (LLMs) have revolutionized natural language processing tasks. Their deployment in wireless networks still face challenges, i.e., a lack of privacy and security protection mechanisms. We introduce two personalized wireless federated fine-tuning methods with low communication overhead.
arXiv Detail & Related papers (2024-04-20T02:30:21Z)
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin [85.16356890023582]
We propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network. It freezes the backbone model and forces a portion of LoRAs to focus on leveraging world knowledge to solve downstream tasks. Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks.
arXiv Detail & Related papers (2023-12-15T17:45:06Z)
Local Learning with Neuron Groups [15.578925277062657]
Local learning is an approach to model-parallelism that removes the standard end-to-end learning setup. We study how local learning can be applied at the level of splitting layers or modules into sub-components.
arXiv Detail & Related papers (2023-01-18T16:25:10Z)
Locally Supervised Learning with Periodic Global Guidance [19.41730292017383]
We propose Periodically Guided local Learning (PGL) to reinstate the global objective repetitively into the local-loss based training of neural networks. We show that a simple periodic guidance scheme begets significant performance gains while having a low memory footprint.
arXiv Detail & Related papers (2022-08-01T13:06:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.