Related papers: ShuffleBlock: Shuffle to Regularize Deep Convolutional Neural Networks

ShuffleBlock: Shuffle to Regularize Deep Convolutional Neural Networks

URL: http://arxiv.org/abs/2106.09358v1
Date: Thu, 17 Jun 2021 10:23:00 GMT
Title: ShuffleBlock: Shuffle to Regularize Deep Convolutional Neural Networks
Authors: Sudhakar Kumawat, Gagan Kanojia, and Shanmuganathan Raman
Abstract summary: This paper studies the operation of channel shuffle as a regularization technique in deep convolutional networks. We show that while random shuffling of channels during training drastically reduce their performance, however, randomly shuffling small patches significantly improves their performance. The ShuffleBlock module is easy to implement and improves the performance of several baseline networks on the task of image classification on CIFAR and ImageNet datasets.
Score: 35.67192058479252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have enormous representational power which leads them to overfit on most datasets. Thus, regularizing them is important in order to reduce overfitting and enhance their generalization capabilities. Recently, channel shuffle operation has been introduced for mixing channels in group convolutions in resource efficient networks in order to reduce memory and computations. This paper studies the operation of channel shuffle as a regularization technique in deep convolutional networks. We show that while random shuffling of channels during training drastically reduce their performance, however, randomly shuffling small patches between channels significantly improves their performance. The patches to be shuffled are picked from the same spatial locations in the feature maps such that a patch, when transferred from one channel to another, acts as structured noise for the later channel. We call this method "ShuffleBlock". The proposed ShuffleBlock module is easy to implement and improves the performance of several baseline networks on the task of image classification on CIFAR and ImageNet datasets. It also achieves comparable and in many cases better performance than many other regularization methods. We provide several ablation studies on selecting various hyperparameters of the ShuffleBlock module and propose a new scheduling method that further enhances its performance.

Related papers

Dynamic Shuffle: An Efficient Channel Mixture Method [8.720510396996142]
We devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling. Experiment results on image classification benchmark datasets have shown that our method significantly increases ShuffleNets' performance.
arXiv Detail & Related papers (2023-10-04T12:47:48Z)
Revisiting Random Channel Pruning for Neural Network Compression [159.99002793644163]
Channel (or 3D filter) pruning serves as an effective way to accelerate the inference of neural networks. In this paper, we try to determine the channel configuration of the pruned models by random search. We show that this simple strategy works quite well compared with other channel pruning methods.
arXiv Detail & Related papers (2022-05-11T17:59:04Z)
Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures. We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels. Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z)
Channel-Level Variable Quantization Network for Deep Image Compression [50.3174629451739]
We propose a channel-level variable quantization network to dynamically allocate more convolutions for significant channels and withdraws for negligible channels. Our method achieves superior performance and can produce much better visual reconstructions.
arXiv Detail & Related papers (2020-07-15T07:20:39Z)
Operation-Aware Soft Channel Pruning using Differentiable Masks [51.04085547997066]
We propose a data-driven algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations. We perform extensive experiments and achieve outstanding performance in terms of the accuracy of output networks.
arXiv Detail & Related papers (2020-07-08T07:44:00Z)
Channel Compression: Rethinking Information Redundancy among Channels in CNN Architecture [3.3018563701013988]
Research on efficient convolutional neural networks (CNNs) aims at removing feature redundancy by decomposing or optimizing the convolutional calculation. In this work, feature redundancy is assumed to exist among channels in CNN architectures, which provides some leeway to boost calculation efficiency. A novel convolutional construction named compact convolution is proposed to embrace the progress in spatial convolution, channel grouping and pooling operation.
arXiv Detail & Related papers (2020-07-02T10:58:54Z)
Multigrid-in-Channels Architectures for Wide Convolutional Neural Networks [6.929025509877642]
We present a multigrid approach that combats the quadratic growth of the number of parameters with respect to the number of channels in standard convolutional neural networks (CNNs) Our examples from supervised image classification show that applying this strategy to residual networks and MobileNetV2 considerably reduces the number of parameters without negatively affecting accuracy.
arXiv Detail & Related papers (2020-06-11T20:28:36Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences [3.8848561367220276]
We present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription.
arXiv Detail & Related papers (2020-04-06T12:44:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.