Related papers: Exploring the parameter reusability of CNN

Exploring the parameter reusability of CNN

URL: http://arxiv.org/abs/2008.03411v2
Date: Fri, 18 Sep 2020 04:23:25 GMT
Title: Exploring the parameter reusability of CNN
Authors: Wei Wang, Lin Cheng, Yanjie Zhu, Dong Liang
Abstract summary: We propose a solution that can judge whether a given network is reusable or not based on the performance of reusing convolution kernels. We define that the success of a CNN's parameter reuse depends upon two conditions: first, the network is a reusable network; and second, the RMSE between the convolution kernels from the source domain and target domain is small enough.
Score: 12.654187477646449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent times, using small data to train networks has become a hot topic in the field of deep learning. Reusing pre-trained parameters is one of the most important strategies to address the issue of semi-supervised and transfer learning. However, the fundamental reason for the success of these methods is still unclear. In this paper, we propose a solution that can not only judge whether a given network is reusable or not based on the performance of reusing convolution kernels but also judge which layers' parameters of the given network can be reused, based on the performance of reusing corresponding parameters and, ultimately, judge whether those parameters are reusable or not in a target task based on the root mean square error (RMSE) of the corresponding convolution kernels. Specifically, we define that the success of a CNN's parameter reuse depends upon two conditions: first, the network is a reusable network; and second, the RMSE between the convolution kernels from the source domain and target domain is small enough. The experimental results demonstrate that the performance of reused parameters applied to target tasks, when these conditions are met, is significantly improved.

Related papers

Dynamic Continual Learning: Harnessing Parameter Uncertainty for Improved Network Adaptation [0.0]
We propose using parameter-based uncertainty to determine which parameters are relevant to a network's learned function. We show improved Continual Learning performance for Average Test Accuracy and Backward Transfer metrics.
arXiv Detail & Related papers (2025-01-18T19:58:53Z)
NIDS Neural Networks Using Sliding Time Window Data Processing with Trainable Activations and its Generalization Capability [0.0]
This paper presents neural networks for network intrusion detection systems (NIDS) that operate on flow data preprocessed with a time window. It requires only eleven features which do not rely on deep packet inspection and can be found in most NIDS datasets and easily obtained from conventional flow collectors. The reported training accuracy exceeds 99% for the proposed method with as little as twenty neural network input features.
arXiv Detail & Related papers (2024-10-24T11:36:19Z)
How Does Overparameterization Affect Features? [42.99771787546585]
We first examine the expressivity of the features of these models, and show that the feature space of over parameterized networks cannot be spanned by concatenating many under parameterized features. We then evaluate the performance of these models, and find that over parameterized networks outperform under parameterized networks. We propose a toy setting to explain how over parameterized networks can learn some important features that the underparamaterized networks cannot learn.
arXiv Detail & Related papers (2024-07-01T05:01:03Z)
Partial Network Cloning [58.83278629019384]
PNC conducts partial parametric "cloning" from a source network and then injects the cloned module to the target. Our method yields a significant improvement of 5% in accuracy and 50% in locality when compared with parameter-tuning based methods.
arXiv Detail & Related papers (2023-03-19T08:20:31Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
An Experimental Study of the Impact of Pre-training on the Pruning of a Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains. Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network. The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z)
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy [42.15969584135412]
Neural network pruning is a popular technique used to reduce the inference costs of modern networks. We evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well. We find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks.
arXiv Detail & Related papers (2021-03-04T13:22:16Z)
Using UNet and PSPNet to explore the reusability principle of CNN parameters [5.623232537411766]
Reusability of parameters in each layer of a deep convolutional neural network is experimentally quantified. Running mean and running variance plays an important role than Weight and Bias in BN layer. The bias in Convolution layers are not sensitive, and it can be reused directly.
arXiv Detail & Related papers (2020-08-08T01:51:08Z)
Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs) We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
Progressive Skeletonization: Trimming more fat from a network at initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity. We then propose two approximate procedures to maximize our objective. Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec. PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks. We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.