Related papers: Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking

Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking

URL: http://arxiv.org/abs/2406.12220v1
Date: Tue, 18 Jun 2024 02:42:19 GMT
Title: Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking
Authors: Ryo Karakida, Toshihiro Ota, Masato Taki,
Abstract summary: Transformers have established themselves as the leading neural network model in natural language processing. Recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers. This paper integrates Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the Transformer block.
Score: 6.9366619419210656
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have established themselves as the leading neural network model in natural language processing and are increasingly foundational in various domains. In vision, the MLP-Mixer model has demonstrated competitive performance, suggesting that attention mechanisms might not be indispensable. Inspired by this, recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers. However, the theoretical framework for these models remains underdeveloped. This paper proposes a novel perspective by integrating Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the entire Transformer block, encompassing token-/channel-mixing modules, layer normalization, and skip connections, as a single Hopfield network. This approach yields a parallelized MLP-Mixer derived from a three-layer Hopfield network, which naturally incorporates symmetric token-/channel-mixing modules and layer normalization. Empirical studies reveal that symmetric interaction matrices in the model hinder performance in image recognition tasks. Introducing symmetry-breaking effects transitions the performance of the symmetric parallelized MLP-Mixer to that of the vanilla MLP-Mixer. This indicates that during standard training, weight matrices of the vanilla MLP-Mixer spontaneously acquire a symmetry-breaking configuration, enhancing their effectiveness. These findings offer insights into the intrinsic properties of Transformers and MLP-Mixers and their theoretical underpinnings, providing a robust framework for future model design and optimization.

Related papers

Model Assembly Learning with Heterogeneous Layer Weight Merging [57.8462476398611]
We introduce Model Assembly Learning (MAL), a novel paradigm for model merging. MAL integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities.
arXiv Detail & Related papers (2025-03-27T16:21:53Z)
SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks. Much less research has been devoted to the channel mixer or feature mixing block (FFN or) We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z)
Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation [11.072628804821083]
CNNs, Transformers, and Fourier-Mixers motivated us to look for similarities and differences between them. We found that these architectures can be interpreted through the lens of a general concept of dimension mixing. In this work, we study group-wise sparse, non-linear, multi-layered and learnable mixing schemes of inputs and find that they are complementary to many standard neural architectures.
arXiv Detail & Related papers (2023-11-30T17:30:45Z)
TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting [13.410217680999459]
Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. High memory and computing requirements pose a critical bottleneck for long-term forecasting. We propose TSMixer, a lightweight neural architecture composed of multi-layer perceptron (MLP) modules.
arXiv Detail & Related papers (2023-06-14T06:26:23Z)
iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer [2.5782420501870296]
We generalize studies on Hopfield networks and Transformer-like architecture to iMixer. iMixer is a generalization that propagates forward from the output side to the input side. We evaluate the model performance with various datasets on image classification tasks. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.
arXiv Detail & Related papers (2023-04-25T18:00:08Z)
Nonlinear Hyperspectral Unmixing based on Multilinear Mixing Model using Convolutional Autoencoders [6.867229549627128]
We propose a novel autoencoder-based network for unsupervised unmixing based on reflection. Experiments on both the synthetic and real datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-03-14T18:11:52Z)
A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference. DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs. We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z)
ActiveMLP: An MLP-like Architecture with Active Token Mixer [54.95923719553343]
This paper presents ActiveMLP, a general-like backbone for computer vision. We propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate contextual information from other tokens in the global scope into the given one. In this way, the spatial range of token-mixing is expanded and the way of token-mixing is reformed.
arXiv Detail & Related papers (2022-03-11T17:29:54Z)
DynaMixer: A Vision MLP Architecture with Dynamic Mixing [38.23027495545522]
This paper presents an efficient tasks-like network architecture, dubbed DynaMixer, resorting to dynamic information fusion. We propose a procedure, on which the DynaMixer model relies, to dynamically generate mixing by leveraging the contents of all the tokens to be mixed. Our proposed DynaMixer model (97M parameters) achieves 84.3% top-1 accuracy on the ImageNet-1K, performing favorably against the state-of-the-art vision models.
arXiv Detail & Related papers (2022-01-28T12:43:14Z)
A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision. Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends. In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z)
Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images. We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z)
MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.