Related papers: Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation

Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation

URL: http://arxiv.org/abs/2311.18735v3
Date: Thu, 10 Oct 2024 14:21:47 GMT
Title: Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation
Authors: Suman Sapkota, Binod Bhattarai,
Abstract summary: CNNs, Transformers, and Fourier-Mixers motivated us to look for similarities and differences between them. We found that these architectures can be interpreted through the lens of a general concept of dimension mixing. In this work, we study group-wise sparse, non-linear, multi-layered and learnable mixing schemes of inputs and find that they are complementary to many standard neural architectures.
Score: 11.072628804821083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent success of multiple neural architectures like CNNs, Transformers, and MLP-Mixers motivated us to look for similarities and differences between them. We found that these architectures can be interpreted through the lens of a general concept of dimension mixing. Research on coupling flows and the butterfly transform shows that partial and hierarchical signal mixing schemes are sufficient for efficient and expressive function approximation. In this work, we study group-wise sparse, non-linear, multi-layered and learnable mixing schemes of inputs and find that they are complementary to many standard neural architectures. Following our observations and drawing inspiration from the Fast Fourier Transform, we generalize Butterfly Structure to use non-linear mixer function allowing for MLP as mixing function called Butterfly MLP. We were also able to sparsely mix along sequence dimension for Transformer-based architectures called Butterfly Attention. Experiments on CIFAR and LRA datasets demonstrate that the proposed Non-Linear Butterfly Mixers are efficient and scale well when the host architectures are used as mixing function. Additionally, we propose Patch-Only MLP-Mixer for processing spatial 2D signals demonstrating a different dimension mixing strategy.

Related papers

D2-MLP: Dynamic Decomposed MLP Mixer for Medical Image Segmentation [12.470164287197454]
Convolutional neural networks are widely used in various segmentation tasks in medical images. They are challenged to learn global features adaptively due to the inherent locality of convolutional operations. We propose a novel Dynamic Decomposed Mixer module to tackle these limitations.
arXiv Detail & Related papers (2024-09-13T15:16:28Z)
Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking [6.9366619419210656]
Transformers have established themselves as the leading neural network model in natural language processing. Recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers. This paper integrates Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the Transformer block.
arXiv Detail & Related papers (2024-06-18T02:42:19Z)
SpiralMLP: A Lightweight Vision MLP Architecture [0.27309692684728615]
We present SpiralMLP, a novel architecture that introduces a Spiral FC layer as a replacement for the conventional Token Mixing approach. Our study reveals that targeting the full receptive field is not essential for achieving high performance, instead, adopting a refined approach offers better results.
arXiv Detail & Related papers (2024-03-31T11:33:39Z)
SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks. Much less research has been devoted to the channel mixer or feature mixing block (FFN or) We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z)
Equivariant Architectures for Learning in Deep Weight Spaces [54.61765488960555]
We present a novel network architecture for learning in deep weight spaces. It takes as input a concatenation of weights and biases of a pre-trainedvariant. We show how these layers can be implemented using three basic operations.
arXiv Detail & Related papers (2023-01-30T10:50:33Z)
Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network. We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling. We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z)
ButterflyFlow: Building Invertible Layers with Butterfly Matrices [80.83142511616262]
We propose a new family of invertible linear layers based on butterfly layers. Based on our invertible butterfly layers, we construct a new class of normalizing flow models called ButterflyFlow.
arXiv Detail & Related papers (2022-09-28T01:58:18Z)
QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer [10.503972720941693]
Current keyword spotting systems are typically trained with a large amount of pre-defined keywords. We propose a pure-vocabulary neural network that is based on theMixer model architecture. Our proposed model has a smaller number of parameters and MACs compared to the baseline models.
arXiv Detail & Related papers (2022-06-23T18:18:44Z)
MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)
Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. In this paper, we explore how to apply mixup to natural language processing tasks. We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.