Dynamic Shuffle: An Efficient Channel Mixture Method
- URL: http://arxiv.org/abs/2310.02776v1
- Date: Wed, 4 Oct 2023 12:47:48 GMT
- Title: Dynamic Shuffle: An Efficient Channel Mixture Method
- Authors: Kaijun Gong, Zhuowen Yin, Yushu Li, Kailing Guo, Xiangmin Xu
- Abstract summary: We devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling.
Experiment results on image classification benchmark datasets have shown that our method significantly increases ShuffleNets' performance.
- Score: 8.720510396996142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The redundancy of Convolutional neural networks not only depends on weights
but also depends on inputs. Shuffling is an efficient operation for mixing
channel information but the shuffle order is usually pre-defined. To reduce the
data-dependent redundancy, we devise a dynamic shuffle module to generate
data-dependent permutation matrices for shuffling. Since the dimension of
permutation matrix is proportional to the square of the number of input
channels, to make the generation process efficiently, we divide the channels
into groups and generate two shared small permutation matrices for each group,
and utilize Kronecker product and cross group shuffle to obtain the final
permutation matrices. To make the generation process learnable, based on
theoretical analysis, softmax, orthogonal regularization, and binarization are
employed to asymptotically approximate the permutation matrix. Dynamic shuffle
adaptively mixes channel information with negligible extra computation and
memory occupancy. Experiment results on image classification benchmark datasets
CIFAR-10, CIFAR-100, Tiny ImageNet and ImageNet have shown that our method
significantly increases ShuffleNets' performance. Adding dynamic generated
matrix with learnable static matrix, we further propose static-dynamic-shuffle
and show that it can serve as a lightweight replacement of ordinary pointwise
convolution.
Related papers
- GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression [64.47244912937204]
We propose a novel transformer-based entropy model called GroupedMixer.
GroupedMixer enjoys both faster coding speed and better compression performance than previous transformer-based methods.
Experimental results demonstrate that the proposed GroupedMixer yields the state-of-the-art rate-distortion performance with fast compression speed.
arXiv Detail & Related papers (2024-05-02T10:48:22Z) - Ensemble Quadratic Assignment Network for Graph Matching [52.20001802006391]
Graph matching is a commonly used technique in computer vision and pattern recognition.
Recent data-driven approaches have improved the graph matching accuracy remarkably.
We propose a graph neural network (GNN) based approach to combine the advantages of data-driven and traditional methods.
arXiv Detail & Related papers (2024-03-11T06:34:05Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network [0.36122488107441414]
Group-equivariant convolutional neural networks (G-CNN) heavily rely on parameter sharing to increase CNN's data efficiency and performance.
We propose a non- parameter-sharing approach for group equivariant neural networks.
The proposed methods adaptively aggregate a diverse range of filters by a weighted sum of decomposedally augmented filters.
arXiv Detail & Related papers (2023-05-17T10:18:02Z) - Push--Pull with Device Sampling [8.344476599818826]
We consider decentralized optimization problems in which a number of agents collaborate to minimize the average of their local functions by exchanging over an underlying communication graph.
We propose an algorithm that combines gradient tracking and variance reduction over the entire network.
Our theoretical analysis shows that the algorithm converges linearly, when the local objective functions are strongly convex.
arXiv Detail & Related papers (2022-06-08T18:18:18Z) - Adaptive Fourier Neural Operators: Efficient Token Mixers for
Transformers [55.90468016961356]
We propose an efficient token mixer that learns to mix in the Fourier domain.
AFNO is based on a principled foundation of operator learning.
It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
arXiv Detail & Related papers (2021-11-24T05:44:31Z) - Sparse Quadratic Optimisation over the Stiefel Manifold with Application
to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function.
We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z) - ShuffleBlock: Shuffle to Regularize Deep Convolutional Neural Networks [35.67192058479252]
This paper studies the operation of channel shuffle as a regularization technique in deep convolutional networks.
We show that while random shuffling of channels during training drastically reduce their performance, however, randomly shuffling small patches significantly improves their performance.
The ShuffleBlock module is easy to implement and improves the performance of several baseline networks on the task of image classification on CIFAR and ImageNet datasets.
arXiv Detail & Related papers (2021-06-17T10:23:00Z) - Revisiting Dynamic Convolution via Matrix Decomposition [81.89967403872147]
We propose dynamic channel fusion to replace dynamic attention over channel groups.
Our method is easier to train and requires significantly fewer parameters without sacrificing accuracy.
arXiv Detail & Related papers (2021-03-15T23:03:18Z) - Kaleidoscope: An Efficient, Learnable Representation For All Structured
Linear Maps [20.151950843660973]
We introduce kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space.
K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures.
We use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.
arXiv Detail & Related papers (2020-12-29T22:51:29Z) - Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences [3.8848561367220276]
We present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization.
The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy.
It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription.
arXiv Detail & Related papers (2020-04-06T12:44:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.