Reusing Convolutional Neural Network Models through Modularization and
Composition
- URL: http://arxiv.org/abs/2311.04438v1
- Date: Wed, 8 Nov 2023 03:18:49 GMT
- Title: Reusing Convolutional Neural Network Models through Modularization and
Composition
- Authors: Binhang Qi, Hailong Sun, Hongyu Zhang, Xiang Gao
- Abstract summary: We propose two modularization approaches named CNNSplitter and GradSplitter.
CNNSplitter decomposes a trained convolutional neural network (CNN) model into $N$ small reusable modules.
The resulting modules can be reused to patch existing CNN models or build new CNN models through composition.
- Score: 22.823870645316397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the widespread success of deep learning technologies, many trained deep
neural network (DNN) models are now publicly available. However, directly
reusing the public DNN models for new tasks often fails due to mismatching
functionality or performance. Inspired by the notion of modularization and
composition in software reuse, we investigate the possibility of improving the
reusability of DNN models in a more fine-grained manner. Specifically, we
propose two modularization approaches named CNNSplitter and GradSplitter, which
can decompose a trained convolutional neural network (CNN) model for $N$-class
classification into $N$ small reusable modules. Each module recognizes one of
the $N$ classes and contains a part of the convolution kernels of the trained
CNN model. Then, the resulting modules can be reused to patch existing CNN
models or build new CNN models through composition. The main difference between
CNNSplitter and GradSplitter lies in their search methods: the former relies on
a genetic algorithm to explore search space, while the latter utilizes a
gradient-based search method. Our experiments with three representative CNNs on
three widely-used public datasets demonstrate the effectiveness of the proposed
approaches. Compared with CNNSplitter, GradSplitter incurs less accuracy loss,
produces much smaller modules (19.88% fewer kernels), and achieves better
results on patching weak models. In particular, experiments on GradSplitter
show that (1) by patching weak models, the average improvement in terms of
precision, recall, and F1-score is 17.13%, 4.95%, and 11.47%, respectively, and
(2) for a new task, compared with the models trained from scratch, reusing
modules achieves similar accuracy (the average loss of accuracy is only 2.46%)
without a costly training process. Our approaches provide a viable solution to
the rapid development and improvement of CNN models.
Related papers
- Robust Mixture-of-Expert Training for Convolutional Neural Networks [141.3531209949845]
Sparsely-gated Mixture of Expert (MoE) has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference.
We propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE.
We find that AdvMoE achieves 1% 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE.
arXiv Detail & Related papers (2023-08-19T20:58:21Z) - Patching Weak Convolutional Neural Network Models through Modularization
and Composition [19.986199290508925]
A convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily.
We propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules.
We show that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.
arXiv Detail & Related papers (2022-09-11T15:26:16Z) - Lightweight Hybrid CNN-ELM Model for Multi-building and Multi-floor
Classification [6.154022105385209]
We propose a lightweight combination of CNN and ELM, which provides a quick and accurate classification of building and floor.
As a result, the proposed model is 58% faster than the benchmark, with a slight improvement in the classification accuracy.
arXiv Detail & Related papers (2022-04-21T21:48:01Z) - Lost Vibration Test Data Recovery Using Convolutional Neural Network: A
Case Study [0.0]
This paper proposes a CNN algorithm for the Alamosa Canyon Bridge as a real structure.
Three different CNN models were considered to predict one and two malfunctioned sensors.
The accuracy of the model was increased by adding a convolutional layer.
arXiv Detail & Related papers (2022-04-11T23:24:03Z) - Model Doctor: A Simple Gradient Aggregation Strategy for Diagnosing and
Treating CNN Classifiers [33.82339346293966]
Convolutional Neural Network (CNN) has achieved excellent performance in the classification task.
It is widely known that CNN is deemed as a 'black-box', which is hard for understanding the prediction mechanism.
We propose the first completely automatic model diagnosing and treating tool, termed as Model Doctor.
arXiv Detail & Related papers (2021-12-09T14:05:00Z) - Decomposing Convolutional Neural Networks into Reusable and Replaceable
Modules [15.729284470106826]
We propose to decompose a CNN model used for image classification problems into modules for each output class.
These modules can further be reused or replaced to build a new model.
We have evaluated our approach with CIFAR-10, CIFAR-100, and ImageNet tiny datasets with three variations of ResNet models.
arXiv Detail & Related papers (2021-10-11T20:41:50Z) - A Battle of Network Structures: An Empirical Study of CNN, Transformer,
and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends.
In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Exploring Deep Hybrid Tensor-to-Vector Network Architectures for
Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size.
CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.