Improving DNN Modularization via Activation-Driven Training
- URL: http://arxiv.org/abs/2411.01074v1
- Date: Fri, 01 Nov 2024 23:07:33 GMT
- Title: Improving DNN Modularization via Activation-Driven Training
- Authors: Tuan Ngo, Abid Hassan, Saad Shafiq, Nenad Medvidovic,
- Abstract summary: MODA promotes inherent modularity within a DNN model by directly regulating the activation outputs of its layers.
It accomplishes modularization with 29% less training time.
It improves the accuracy of a target class by 12% on average while ensuring minimal impact on the accuracy of other classes.
- Score: 5.4070914322511925
- License:
- Abstract: Deep Neural Networks (DNNs) suffer from significant retraining costs when adapting to evolving requirements. Modularizing DNNs offers the promise of improving their reusability. Previous work has proposed techniques to decompose DNN models into modules both during and after training. However, these strategies yield several shortcomings, including significant weight overlaps and accuracy losses across modules, restricted focus on convolutional layers only, and added complexity and training time by introducing auxiliary masks to control modularity. In this work, we propose MODA, an activation-driven modular training approach. MODA promotes inherent modularity within a DNN model by directly regulating the activation outputs of its layers based on three modular objectives: intra-class affinity, inter-class dispersion, and compactness. MODA is evaluated using three well-known DNN models and three datasets with varying sizes. This evaluation indicates that, compared to the existing state-of-the-art, using MODA yields several advantages: (1) MODA accomplishes modularization with 29% less training time; (2) the resultant modules generated by MODA comprise 2.4x fewer weights and 3.5x less weight overlap while (3) preserving the original model's accuracy without additional fine-tuning; in module replacement scenarios, (4) MODA improves the accuracy of a target class by 12% on average while ensuring minimal impact on the accuracy of other classes.
Related papers
- m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers [27.73393245438193]
We propose module-to-module knowledge distillation (m2mKD) for transferring knowledge between modules.
We evaluate m2mKD on two modular neural architectures: Neural Attentive Circuits (NACs) and Vision Mixture-of-Experts (V-MoE)
Applying m2mKD to NACs yields significant improvements in IID accuracy on Tiny-ImageNet and OOD robustness on Tiny-ImageNet-R.
arXiv Detail & Related papers (2024-02-26T04:47:32Z) - Reusing Convolutional Neural Network Models through Modularization and
Composition [22.823870645316397]
We propose two modularization approaches named CNNSplitter and GradSplitter.
CNNSplitter decomposes a trained convolutional neural network (CNN) model into $N$ small reusable modules.
The resulting modules can be reused to patch existing CNN models or build new CNN models through composition.
arXiv Detail & Related papers (2023-11-08T03:18:49Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Module-wise Adaptive Distillation for Multimodality Foundation Models [125.42414892566843]
multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes.
One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer.
Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently.
arXiv Detail & Related papers (2023-10-06T19:24:00Z) - Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks.
Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs.
We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z) - Modularizing while Training: A New Paradigm for Modularizing DNN Models [20.892788625187702]
We propose a novel approach that incorporates modularization into the model training process, i.e., modularizing-while-training (MwT)
The accuracy loss caused by MwT is only 1.13 percentage points, which is 1.76 percentage points less than that of the baseline.
The total time cost required for training and modularizing is only 108 minutes, half of the baseline.
arXiv Detail & Related papers (2023-06-15T07:45:43Z) - ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models.
Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z) - Bulk-Switching Memristor-based Compute-In-Memory Module for Deep Neural
Network Training [15.660697326769686]
We propose a mixed-precision training scheme for memristor-based compute-in-memory (CIM) modules.
The proposed scheme is implemented with a system-on-chip (SoC) of fully integrated analog CIM modules and digital sub-systems.
The efficacy of training larger models is evaluated using realistic hardware parameters and shows that analog CIM modules can enable efficient mix-precision training with accuracy comparable to full-precision software trained models.
arXiv Detail & Related papers (2023-05-23T22:03:08Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Neural Network Module Decomposition and Recomposition [35.21448933547118]
We propose a modularization method that decomposes a deep neural network (DNN) into small modules from a functionality perspective.
We demonstrate that the proposed method can decompose and recompose DNNs with high compression ratio and high accuracy.
arXiv Detail & Related papers (2021-12-25T08:36:47Z) - Accurate and Lightweight Image Super-Resolution with Model-Guided Deep
Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN)
MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations)
The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv Detail & Related papers (2020-09-14T08:23:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.