Speeding Up EfficientNet: Selecting Update Blocks of Convolutional
Neural Networks using Genetic Algorithm in Transfer Learning
- URL: http://arxiv.org/abs/2303.00261v1
- Date: Wed, 1 Mar 2023 06:35:29 GMT
- Title: Speeding Up EfficientNet: Selecting Update Blocks of Convolutional
Neural Networks using Genetic Algorithm in Transfer Learning
- Authors: Md. Mehedi Hasana, Muhammad Ibrahim, Md. Sawkat Ali
- Abstract summary: We devise a genetic algorithm to select blocks of layers for updating the parameters.
We show that our algorithm yields similar or better results than the baseline in terms of accuracy.
We also devise a metric called block importance to measure efficacy of each block as update block.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The performance of convolutional neural networks (CNN) depends heavily on
their architectures. Transfer learning performance of a CNN relies quite
strongly on selection of its trainable layers. Selecting the most effective
update layers for a certain target dataset often requires expert knowledge on
CNN architecture which many practitioners do not posses. General users prefer
to use an available architecture (e.g. GoogleNet, ResNet, EfficientNet etc.)
that is developed by domain experts. With the ever-growing number of layers, it
is increasingly becoming quite difficult and cumbersome to handpick the update
layers. Therefore, in this paper we explore the application of genetic
algorithm to mitigate this problem. The convolutional layers of popular
pretrained networks are often grouped into modules that constitute their
building blocks. We devise a genetic algorithm to select blocks of layers for
updating the parameters. By experimenting with EfficientNetB0 pre-trained on
ImageNet and using Food-101, CIFAR-100 and MangoLeafBD as target datasets, we
show that our algorithm yields similar or better results than the baseline in
terms of accuracy, and requires lower training and evaluation time due to
learning less number of parameters. We also devise a metric called block
importance to measure efficacy of each block as update block and analyze the
importance of the blocks selected by our algorithm.
Related papers
- Time Elastic Neural Networks [2.1756081703276]
We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN)
The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability.
We demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell.
arXiv Detail & Related papers (2024-05-27T09:01:30Z) - Training Convolutional Neural Networks with the Forward-Forward
algorithm [1.74440662023704]
Forward Forward (FF) algorithm has up to now only been used in fully connected networks.
We show how the FF paradigm can be extended to CNNs.
Our FF-trained CNN, featuring a novel spatially-extended labeling technique, achieves a classification accuracy of 99.16% on the MNIST hand-written digits dataset.
arXiv Detail & Related papers (2023-12-22T18:56:35Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Breaking the Architecture Barrier: A Method for Efficient Knowledge
Transfer Across Networks [0.0]
We present a method for transferring parameters between neural networks with different architectures.
Our method, called DPIAT, uses dynamic programming to match blocks and layers between architectures and transfer parameters efficiently.
In experiments on ImageNet, our method improved validation accuracy by an average of 1.6 times after 50 epochs of training.
arXiv Detail & Related papers (2022-12-28T17:35:41Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Towards a General Purpose CNN for Long Range Dependencies in
$\mathrm{N}$D [49.57261544331683]
We propose a single CNN architecture equipped with continuous convolutional kernels for tasks on arbitrary resolution, dimensionality and length without structural changes.
We show the generality of our approach by applying the same CCNN to a wide set of tasks on sequential (1$mathrmD$) and visual data (2$mathrmD$)
Our CCNN performs competitively and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2022-06-07T15:48:02Z) - Tricks and Plugins to GBM on Images and Sequences [18.939336393665553]
We propose a new algorithm for boosting Deep Convolutional Neural Networks (BoostCNN) to combine the merits of dynamic feature selection and BoostCNN.
We also propose a set of algorithms to incorporate boosting weights into a deep learning architecture based on a least squares objective function.
Experiments show that the proposed methods outperform benchmarks on several fine-grained classification tasks.
arXiv Detail & Related papers (2022-03-01T21:59:00Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - RIFLE: Backpropagation in Depth for Deep Transfer Learning through
Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task.
We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings.
RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z) - Inferring Convolutional Neural Networks' accuracies from their
architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance.
We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems.
We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.