PSO-Convolutional Neural Networks with Heterogeneous Learning Rate
- URL: http://arxiv.org/abs/2205.10456v3
- Date: Tue, 12 Sep 2023 14:22:36 GMT
- Title: PSO-Convolutional Neural Networks with Heterogeneous Learning Rate
- Authors: Nguyen Huu Phong, Augusto Santos, Bernardete Ribeiro
- Abstract summary: Convolutional Neural Networks (ConvNets or CNNs) have been candidly deployed in the scope of computer vision and related fields.
In this article, we propose a novel Particle Swarm Optimization (PSO) based training for ConvNets.
In such framework, the vector of weights of each ConvNet is intertwine as a particle in phase space whereby PSO dynamicss with Gradient Descent (SGD) in order to boost training performance and generalization.
- Score: 4.243356707599486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional Neural Networks (ConvNets or CNNs) have been candidly deployed
in the scope of computer vision and related fields. Nevertheless, the dynamics
of training of these neural networks lie still elusive: it is hard and
computationally expensive to train them. A myriad of architectures and training
strategies have been proposed to overcome this challenge and address several
problems in image processing such as speech, image and action recognition as
well as object detection. In this article, we propose a novel Particle Swarm
Optimization (PSO) based training for ConvNets. In such framework, the vector
of weights of each ConvNet is typically cast as the position of a particle in
phase space whereby PSO collaborative dynamics intertwines with Stochastic
Gradient Descent (SGD) in order to boost training performance and
generalization. Our approach goes as follows: i) [regular phase] each ConvNet
is trained independently via SGD; ii) [collaborative phase] ConvNets share
among themselves their current vector of weights (or particle-position) along
with their gradient estimates of the Loss function. Distinct step sizes are
coined by distinct ConvNets. By properly blending ConvNets with large (possibly
random) step-sizes along with more conservative ones, we propose an algorithm
with competitive performance with respect to other PSO-based approaches on
Cifar-10 and Cifar-100 (accuracy of 98.31% and 87.48%). These accuracy levels
are obtained by resorting to only four ConvNets -- such results are expected to
scale with the number of collaborative ConvNets accordingly. We make our source
codes available for download https://github.com/leonlha/PSO-ConvNet-Dynamics.
Related papers
- Video Action Recognition Collaborative Learning with Dynamics via
PSO-ConvNet Transformer [1.876462046907555]
We propose a novel PSO-ConvNet model for learning actions in videos.
Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy.
Overall, our dynamic PSO-ConvNet model provides a promising direction for improving Human Action Recognition.
arXiv Detail & Related papers (2023-02-17T23:39:34Z) - MogaNet: Multi-order Gated Aggregation Network [64.16774341908365]
We propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning.
MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module.
MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet.
arXiv Detail & Related papers (2022-11-07T04:31:17Z) - Dynamic ConvNets on Tiny Devices via Nested Sparsity [3.0313758880048765]
This work introduces a new training and compression pipeline to build Nested Sparse ConvNets.
A Nested Sparse ConvNet consists of a single ConvNet architecture containing N sparse sub-networks with nested weights subsets.
Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 Micro Controller Unit.
arXiv Detail & Related papers (2022-03-07T12:07:02Z) - Optimising for Interpretability: Convolutional Dynamic Alignment
Networks [108.83345790813445]
We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks (CoDA Nets)
Their core building blocks are Dynamic Alignment Units (DAUs), which are optimised to transform their inputs with dynamically computed weight vectors that align with task-relevant patterns.
CoDA Nets model the classification prediction through a series of input-dependent linear transformations, allowing for linear decomposition of the output into individual input contributions.
arXiv Detail & Related papers (2021-09-27T12:39:46Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Convolutional Normalization: Improving Deep Convolutional Network
Robustness and Training [44.66478612082257]
Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets)
We introduce a simple and efficient convolutional normalization'' method that can fully exploit the convolutional structure in the Fourier domain.
We show that convolutional normalization can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network.
arXiv Detail & Related papers (2021-03-01T00:33:04Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Encoding the latent posterior of Bayesian Neural Networks for
uncertainty quantification [10.727102755903616]
We aim for efficient deep BNNs amenable to complex computer vision architectures.
We achieve this by leveraging variational autoencoders (VAEs) to learn the interaction and the latent distribution of the parameters at each network layer.
Our approach, Latent-Posterior BNN (LP-BNN), is compatible with the recent BatchEnsemble method, leading to highly efficient (in terms of computation and memory during both training and testing) ensembles.
arXiv Detail & Related papers (2020-12-04T19:50:09Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.