Related papers: Reusing Convolutional Neural Network Models through Modularization and Composition

Reusing Convolutional Neural Network Models through Modularization and Composition

URL: http://arxiv.org/abs/2311.04438v1
Date: Wed, 8 Nov 2023 03:18:49 GMT
Title: Reusing Convolutional Neural Network Models through Modularization and Composition
Authors: Binhang Qi, Hailong Sun, Hongyu Zhang, Xiang Gao
Abstract summary: We propose two modularization approaches named CNNSplitter and GradSplitter. CNNSplitter decomposes a trained convolutional neural network (CNN) model into $N$ small reusable modules. The resulting modules can be reused to patch existing CNN models or build new CNN models through composition.
Score: 22.823870645316397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the widespread success of deep learning technologies, many trained deep neural network (DNN) models are now publicly available. However, directly reusing the public DNN models for new tasks often fails due to mismatching functionality or performance. Inspired by the notion of modularization and composition in software reuse, we investigate the possibility of improving the reusability of DNN models in a more fine-grained manner. Specifically, we propose two modularization approaches named CNNSplitter and GradSplitter, which can decompose a trained convolutional neural network (CNN) model for $N$-class classification into $N$ small reusable modules. Each module recognizes one of the $N$ classes and contains a part of the convolution kernels of the trained CNN model. Then, the resulting modules can be reused to patch existing CNN models or build new CNN models through composition. The main difference between CNNSplitter and GradSplitter lies in their search methods: the former relies on a genetic algorithm to explore search space, while the latter utilizes a gradient-based search method. Our experiments with three representative CNNs on three widely-used public datasets demonstrate the effectiveness of the proposed approaches. Compared with CNNSplitter, GradSplitter incurs less accuracy loss, produces much smaller modules (19.88% fewer kernels), and achieves better results on patching weak models. In particular, experiments on GradSplitter show that (1) by patching weak models, the average improvement in terms of precision, recall, and F1-score is 17.13%, 4.95%, and 11.47%, respectively, and (2) for a new task, compared with the models trained from scratch, reusing modules achieves similar accuracy (the average loss of accuracy is only 2.46%) without a costly training process. Our approaches provide a viable solution to the rapid development and improvement of CNN models.

Related papers

Robust Mixture-of-Expert Training for Convolutional Neural Networks [141.3531209949845]
Sparsely-gated Mixture of Expert (MoE) has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference. We propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE. We find that AdvMoE achieves 1% 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE.
arXiv Detail & Related papers (2023-08-19T20:58:21Z)
Patching Weak Convolutional Neural Network Models through Modularization and Composition [19.986199290508925]
A convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily. We propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules. We show that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.
arXiv Detail & Related papers (2022-09-11T15:26:16Z)
Lightweight Hybrid CNN-ELM Model for Multi-building and Multi-floor Classification [6.154022105385209]
We propose a lightweight combination of CNN and ELM, which provides a quick and accurate classification of building and floor. As a result, the proposed model is 58% faster than the benchmark, with a slight improvement in the classification accuracy.
arXiv Detail & Related papers (2022-04-21T21:48:01Z)
Lost Vibration Test Data Recovery Using Convolutional Neural Network: A Case Study [0.0]
This paper proposes a CNN algorithm for the Alamosa Canyon Bridge as a real structure. Three different CNN models were considered to predict one and two malfunctioned sensors. The accuracy of the model was increased by adding a convolutional layer.
arXiv Detail & Related papers (2022-04-11T23:24:03Z)
Model Doctor: A Simple Gradient Aggregation Strategy for Diagnosing and Treating CNN Classifiers [33.82339346293966]
Convolutional Neural Network (CNN) has achieved excellent performance in the classification task. It is widely known that CNN is deemed as a 'black-box', which is hard for understanding the prediction mechanism. We propose the first completely automatic model diagnosing and treating tool, termed as Model Doctor.
arXiv Detail & Related papers (2021-12-09T14:05:00Z)
Decomposing Convolutional Neural Networks into Reusable and Replaceable Modules [15.729284470106826]
We propose to decompose a CNN model used for image classification problems into modules for each output class. These modules can further be reused or replaced to build a new model. We have evaluated our approach with CIFAR-10, CIFAR-100, and ImageNet tiny datasets with three variations of ResNet models.
arXiv Detail & Related papers (2021-10-11T20:41:50Z)
A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision. Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends. In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z)
Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes. We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.