ShuffleBlock: Shuffle to Regularize Deep Convolutional Neural Networks
- URL: http://arxiv.org/abs/2106.09358v1
- Date: Thu, 17 Jun 2021 10:23:00 GMT
- Title: ShuffleBlock: Shuffle to Regularize Deep Convolutional Neural Networks
- Authors: Sudhakar Kumawat, Gagan Kanojia, and Shanmuganathan Raman
- Abstract summary: This paper studies the operation of channel shuffle as a regularization technique in deep convolutional networks.
We show that while random shuffling of channels during training drastically reduce their performance, however, randomly shuffling small patches significantly improves their performance.
The ShuffleBlock module is easy to implement and improves the performance of several baseline networks on the task of image classification on CIFAR and ImageNet datasets.
- Score: 35.67192058479252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have enormous representational power which leads them to
overfit on most datasets. Thus, regularizing them is important in order to
reduce overfitting and enhance their generalization capabilities. Recently,
channel shuffle operation has been introduced for mixing channels in group
convolutions in resource efficient networks in order to reduce memory and
computations. This paper studies the operation of channel shuffle as a
regularization technique in deep convolutional networks. We show that while
random shuffling of channels during training drastically reduce their
performance, however, randomly shuffling small patches between channels
significantly improves their performance. The patches to be shuffled are picked
from the same spatial locations in the feature maps such that a patch, when
transferred from one channel to another, acts as structured noise for the later
channel. We call this method "ShuffleBlock". The proposed ShuffleBlock module
is easy to implement and improves the performance of several baseline networks
on the task of image classification on CIFAR and ImageNet datasets. It also
achieves comparable and in many cases better performance than many other
regularization methods. We provide several ablation studies on selecting
various hyperparameters of the ShuffleBlock module and propose a new scheduling
method that further enhances its performance.
Related papers
- Dynamic Shuffle: An Efficient Channel Mixture Method [8.720510396996142]
We devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling.
Experiment results on image classification benchmark datasets have shown that our method significantly increases ShuffleNets' performance.
arXiv Detail & Related papers (2023-10-04T12:47:48Z) - Revisiting Random Channel Pruning for Neural Network Compression [159.99002793644163]
Channel (or 3D filter) pruning serves as an effective way to accelerate the inference of neural networks.
In this paper, we try to determine the channel configuration of the pruned models by random search.
We show that this simple strategy works quite well compared with other channel pruning methods.
arXiv Detail & Related papers (2022-05-11T17:59:04Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - Channel-Level Variable Quantization Network for Deep Image Compression [50.3174629451739]
We propose a channel-level variable quantization network to dynamically allocate more convolutions for significant channels and withdraws for negligible channels.
Our method achieves superior performance and can produce much better visual reconstructions.
arXiv Detail & Related papers (2020-07-15T07:20:39Z) - Operation-Aware Soft Channel Pruning using Differentiable Masks [51.04085547997066]
We propose a data-driven algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations.
We perform extensive experiments and achieve outstanding performance in terms of the accuracy of output networks.
arXiv Detail & Related papers (2020-07-08T07:44:00Z) - Channel Compression: Rethinking Information Redundancy among Channels in
CNN Architecture [3.3018563701013988]
Research on efficient convolutional neural networks (CNNs) aims at removing feature redundancy by decomposing or optimizing the convolutional calculation.
In this work, feature redundancy is assumed to exist among channels in CNN architectures, which provides some leeway to boost calculation efficiency.
A novel convolutional construction named compact convolution is proposed to embrace the progress in spatial convolution, channel grouping and pooling operation.
arXiv Detail & Related papers (2020-07-02T10:58:54Z) - Multigrid-in-Channels Architectures for Wide Convolutional Neural
Networks [6.929025509877642]
We present a multigrid approach that combats the quadratic growth of the number of parameters with respect to the number of channels in standard convolutional neural networks (CNNs)
Our examples from supervised image classification show that applying this strategy to residual networks and MobileNetV2 considerably reduces the number of parameters without negatively affecting accuracy.
arXiv Detail & Related papers (2020-06-11T20:28:36Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences [3.8848561367220276]
We present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization.
The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy.
It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription.
arXiv Detail & Related papers (2020-04-06T12:44:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.