Patching Weak Convolutional Neural Network Models through Modularization
and Composition
- URL: http://arxiv.org/abs/2209.06116v3
- Date: Sun, 30 Jul 2023 03:33:51 GMT
- Title: Patching Weak Convolutional Neural Network Models through Modularization
and Composition
- Authors: Binhang Qi, Hailong Sun, Xiang Gao, Hongyu Zhang
- Abstract summary: A convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily.
We propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules.
We show that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.
- Score: 19.986199290508925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite great success in many applications, deep neural networks are not
always robust in practice. For instance, a convolutional neuron network (CNN)
model for classification tasks often performs unsatisfactorily in classifying
some particular classes of objects. In this work, we are concerned with
patching the weak part of a CNN model instead of improving it through the
costly retraining of the entire model. Inspired by the fundamental concepts of
modularization and composition in software engineering, we propose a compressed
modularization approach, CNNSplitter, which decomposes a strong CNN model for
$N$-class classification into $N$ smaller CNN modules. Each module is a
sub-model containing a part of the convolution kernels of the strong model. To
patch a weak CNN model that performs unsatisfactorily on a target class (TC),
we compose the weak CNN model with the corresponding module obtained from a
strong CNN model. The ability of the weak CNN model to recognize the TC can
thus be improved through patching. Moreover, the ability to recognize non-TCs
is also improved, as the samples misclassified as TC could be classified as
non-TCs correctly. Experimental results with two representative CNNs on three
widely-used datasets show that the averaged improvement on the TC in terms of
precision and recall are 12.54% and 2.14%, respectively. Moreover, patching
improves the accuracy of non-TCs by 1.18%. The results demonstrate that
CNNSplitter can patch a weak CNN model through modularization and composition,
thus providing a new solution for developing robust CNN models.
Related papers
- OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve.
We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap.
This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z) - Reusing Convolutional Neural Network Models through Modularization and
Composition [22.823870645316397]
We propose two modularization approaches named CNNSplitter and GradSplitter.
CNNSplitter decomposes a trained convolutional neural network (CNN) model into $N$ small reusable modules.
The resulting modules can be reused to patch existing CNN models or build new CNN models through composition.
arXiv Detail & Related papers (2023-11-08T03:18:49Z) - Robust Mixture-of-Expert Training for Convolutional Neural Networks [141.3531209949845]
Sparsely-gated Mixture of Expert (MoE) has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference.
We propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE.
We find that AdvMoE achieves 1% 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE.
arXiv Detail & Related papers (2023-08-19T20:58:21Z) - Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command
Recognition [9.262289183808035]
This work aims to design a low complexity spoken command recognition (SCR) system.
We exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline.
Our proposed CNN+(TT-DNN) model attains a competitive accuracy of 96.31% with 4 times fewer model parameters than the CNN model.
arXiv Detail & Related papers (2022-01-11T05:57:38Z) - Decomposing Convolutional Neural Networks into Reusable and Replaceable
Modules [15.729284470106826]
We propose to decompose a CNN model used for image classification problems into modules for each output class.
These modules can further be reused or replaced to build a new model.
We have evaluated our approach with CIFAR-10, CIFAR-100, and ImageNet tiny datasets with three variations of ResNet models.
arXiv Detail & Related papers (2021-10-11T20:41:50Z) - Transformed CNNs: recasting pre-trained convolutional layers with
self-attention [17.96659165573821]
Vision Transformers (ViT) have emerged as a powerful alternative to convolutional networks (CNNs)
In this work, we explore the idea of reducing the time spent training these layers by initializing them as convolutional layers.
With only 50 epochs of fine-tuning, the resulting T-CNNs demonstrate significant performance gains.
arXiv Detail & Related papers (2021-06-10T14:56:10Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Exploring Deep Hybrid Tensor-to-Vector Network Architectures for
Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size.
CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.