Related papers: Dynamic Shuffle: An Efficient Channel Mixture Method

Dynamic Shuffle: An Efficient Channel Mixture Method

URL: http://arxiv.org/abs/2310.02776v1
Date: Wed, 4 Oct 2023 12:47:48 GMT
Title: Dynamic Shuffle: An Efficient Channel Mixture Method
Authors: Kaijun Gong, Zhuowen Yin, Yushu Li, Kailing Guo, Xiangmin Xu
Abstract summary: We devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling. Experiment results on image classification benchmark datasets have shown that our method significantly increases ShuffleNets' performance.
Score: 8.720510396996142
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The redundancy of Convolutional neural networks not only depends on weights but also depends on inputs. Shuffling is an efficient operation for mixing channel information but the shuffle order is usually pre-defined. To reduce the data-dependent redundancy, we devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling. Since the dimension of permutation matrix is proportional to the square of the number of input channels, to make the generation process efficiently, we divide the channels into groups and generate two shared small permutation matrices for each group, and utilize Kronecker product and cross group shuffle to obtain the final permutation matrices. To make the generation process learnable, based on theoretical analysis, softmax, orthogonal regularization, and binarization are employed to asymptotically approximate the permutation matrix. Dynamic shuffle adaptively mixes channel information with negligible extra computation and memory occupancy. Experiment results on image classification benchmark datasets CIFAR-10, CIFAR-100, Tiny ImageNet and ImageNet have shown that our method significantly increases ShuffleNets' performance. Adding dynamic generated matrix with learnable static matrix, we further propose static-dynamic-shuffle and show that it can serve as a lightweight replacement of ordinary pointwise convolution.

Related papers

Learning Symmetries via Weight-Sharing with Doubly Stochastic Tensors [46.59269589647962]
Group equivariance has emerged as a valuable inductive bias in deep learning. Group equivariant methods require the groups of interest to be known beforehand. We show that when the dataset exhibits strong symmetries, the permutation matrices will converge to regular group representations.
arXiv Detail & Related papers (2024-12-05T20:15:34Z)
GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression [64.47244912937204]
We propose a novel transformer-based entropy model called GroupedMixer. GroupedMixer enjoys both faster coding speed and better compression performance than previous transformer-based methods. Experimental results demonstrate that the proposed GroupedMixer yields the state-of-the-art rate-distortion performance with fast compression speed.
arXiv Detail & Related papers (2024-05-02T10:48:22Z)
Ensemble Quadratic Assignment Network for Graph Matching [52.20001802006391]
Graph matching is a commonly used technique in computer vision and pattern recognition. Recent data-driven approaches have improved the graph matching accuracy remarkably. We propose a graph neural network (GNN) based approach to combine the advantages of data-driven and traditional methods.
arXiv Detail & Related papers (2024-03-11T06:34:05Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network [0.36122488107441414]
Group-equivariant convolutional neural networks (G-CNN) heavily rely on parameter sharing to increase CNN's data efficiency and performance. We propose a non- parameter-sharing approach for group equivariant neural networks. The proposed methods adaptively aggregate a diverse range of filters by a weighted sum of decomposedally augmented filters.
arXiv Detail & Related papers (2023-05-17T10:18:02Z)
Push--Pull with Device Sampling [8.344476599818826]
We consider decentralized optimization problems in which a number of agents collaborate to minimize the average of their local functions by exchanging over an underlying communication graph. We propose an algorithm that combines gradient tracking and variance reduction over the entire network. Our theoretical analysis shows that the algorithm converges linearly, when the local objective functions are strongly convex.
arXiv Detail & Related papers (2022-06-08T18:18:18Z)
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers [55.90468016961356]
We propose an efficient token mixer that learns to mix in the Fourier domain. AFNO is based on a principled foundation of operator learning. It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
arXiv Detail & Related papers (2021-11-24T05:44:31Z)
Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function. We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z)
ShuffleBlock: Shuffle to Regularize Deep Convolutional Neural Networks [35.67192058479252]
This paper studies the operation of channel shuffle as a regularization technique in deep convolutional networks. We show that while random shuffling of channels during training drastically reduce their performance, however, randomly shuffling small patches significantly improves their performance. The ShuffleBlock module is easy to implement and improves the performance of several baseline networks on the task of image classification on CIFAR and ImageNet datasets.
arXiv Detail & Related papers (2021-06-17T10:23:00Z)
Revisiting Dynamic Convolution via Matrix Decomposition [81.89967403872147]
We propose dynamic channel fusion to replace dynamic attention over channel groups. Our method is easier to train and requires significantly fewer parameters without sacrificing accuracy.
arXiv Detail & Related papers (2021-03-15T23:03:18Z)
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps [20.151950843660973]
We introduce kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space. K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures. We use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.
arXiv Detail & Related papers (2020-12-29T22:51:29Z)
Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences [3.8848561367220276]
We present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription.
arXiv Detail & Related papers (2020-04-06T12:44:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.