Modularizing while Training: A New Paradigm for Modularizing DNN Models
- URL: http://arxiv.org/abs/2306.09376v3
- Date: Thu, 5 Oct 2023 10:44:36 GMT
- Title: Modularizing while Training: A New Paradigm for Modularizing DNN Models
- Authors: Binhang Qi, Hailong Sun, Hongyu Zhang, Ruobing Zhao, Xiang Gao
- Abstract summary: We propose a novel approach that incorporates modularization into the model training process, i.e., modularizing-while-training (MwT)
The accuracy loss caused by MwT is only 1.13 percentage points, which is 1.76 percentage points less than that of the baseline.
The total time cost required for training and modularizing is only 108 minutes, half of the baseline.
- Score: 20.892788625187702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural network (DNN) models have become increasingly crucial components
in intelligent software systems. However, training a DNN model is typically
expensive in terms of both time and money. To address this issue, researchers
have recently focused on reusing existing DNN models - borrowing the idea of
code reuse in software engineering. However, reusing an entire model could
cause extra overhead or inherits the weakness from the undesired
functionalities. Hence, existing work proposes to decompose an already trained
model into modules, i.e., modularizing-after-training, and enable module reuse.
Since trained models are not built for modularization,
modularizing-after-training incurs huge overhead and model accuracy loss. In
this paper, we propose a novel approach that incorporates modularization into
the model training process, i.e., modularizing-while-training (MwT). We train a
model to be structurally modular through two loss functions that optimize
intra-module cohesion and inter-module coupling. We have implemented the
proposed approach for modularizing Convolutional Neural Network (CNN) models in
this work. The evaluation results on representative models demonstrate that MwT
outperforms the state-of-the-art approach. Specifically, the accuracy loss
caused by MwT is only 1.13 percentage points, which is 1.76 percentage points
less than that of the baseline. The kernel retention rate of the modules
generated by MwT is only 14.58%, with a reduction of 74.31% over the
state-of-the-art approach. Furthermore, the total time cost required for
training and modularizing is only 108 minutes, half of the baseline.
Related papers
- Improving DNN Modularization via Activation-Driven Training [5.4070914322511925]
MODA promotes inherent modularity within a DNN model by directly regulating the activation outputs of its layers.
It accomplishes modularization with 29% less training time.
It improves the accuracy of a target class by 12% on average while ensuring minimal impact on the accuracy of other classes.
arXiv Detail & Related papers (2024-11-01T23:07:33Z) - Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models [31.960749305728488]
We introduce a novel concept dubbed modular neural tangent kernel (mNTK)
We show that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue $lambda_max$.
We propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their $lambda_max$ exceeding a dynamic threshold.
arXiv Detail & Related papers (2024-05-13T07:46:48Z) - Reusing Convolutional Neural Network Models through Modularization and
Composition [22.823870645316397]
We propose two modularization approaches named CNNSplitter and GradSplitter.
CNNSplitter decomposes a trained convolutional neural network (CNN) model into $N$ small reusable modules.
The resulting modules can be reused to patch existing CNN models or build new CNN models through composition.
arXiv Detail & Related papers (2023-11-08T03:18:49Z) - ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models.
Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z) - Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency.
We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training.
We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - MILO: Model-Agnostic Subset Selection Framework for Efficient Model
Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training.
Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z) - Deep Model Assembling [31.88606253639418]
This paper studies a divide-and-conquer strategy to train large models.
It divides a large model into smaller modules, training them independently, and reassembling the trained modules to obtain the target model.
We introduce a global, shared meta model to implicitly link all the modules together.
This enables us to train highly compatible modules that collaborate effectively when they are assembled together.
arXiv Detail & Related papers (2022-12-08T08:04:06Z) - Neural Network Module Decomposition and Recomposition [35.21448933547118]
We propose a modularization method that decomposes a deep neural network (DNN) into small modules from a functionality perspective.
We demonstrate that the proposed method can decompose and recompose DNNs with high compression ratio and high accuracy.
arXiv Detail & Related papers (2021-12-25T08:36:47Z) - Towards Efficient Post-training Quantization of Pre-trained Language
Models [85.68317334241287]
We study post-training quantization(PTQ) of PLMs, and propose module-wise quantization error minimization(MREM), an efficient solution to mitigate these issues.
Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
arXiv Detail & Related papers (2021-09-30T12:50:06Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.