Related papers: iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer

iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer

URL: http://arxiv.org/abs/2304.13061v2
Date: Mon, 1 Apr 2024 06:42:17 GMT
Title: iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer
Authors: Toshihiro Ota, Masato Taki,
Abstract summary: We generalize studies on Hopfield networks and Transformer-like architecture to iMixer. iMixer is a generalization that propagates forward from the output side to the input side. We evaluate the model performance with various datasets on image classification tasks. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.
Score: 2.5782420501870296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the last few years, the success of Transformers in computer vision has stimulated the discovery of many alternative models that compete with Transformers, such as the MLP-Mixer. Despite their weak inductive bias, these models have achieved performance comparable to well-studied convolutional neural networks. Recent studies on modern Hopfield networks suggest the correspondence between certain energy-based associative memory models and Transformers or MLP-Mixer, and shed some light on the theoretical background of the Transformer-type architectures design. In this paper, we generalize the correspondence to the recently introduced hierarchical Hopfield network, and find iMixer, a novel generalization of MLP-Mixer model. Unlike ordinary feedforward neural networks, iMixer involves MLP layers that propagate forward from the output side to the input side. We characterize the module as an example of invertible, implicit, and iterative mixing module. We evaluate the model performance with various datasets on image classification tasks, and find that iMixer, despite its unique architecture, exhibits stable learning capabilities and achieves performance comparable to or better than the baseline vanilla MLP-Mixer. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.

Related papers

Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising [57.857534644932194]
We introduce Remix-DiT, a novel method to enhance output quality at a low cost. The goal of Remix-DiT is to craft N diffusion experts for different denoising timesteps, yet without the need for expensive training of N independent models. Experiments conducted on the ImageNet dataset demonstrate that Remix-DiT achieves promising results.
arXiv Detail & Related papers (2024-12-07T11:52:41Z)
The Mamba in the Llama: Distilling and Accelerating Hybrid Models [76.64055251296548]
We show that it is feasible to distill large Transformers into linear RNNs by reusing the linear projection weights from attention layers with academic GPU resources. The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks.
arXiv Detail & Related papers (2024-08-27T17:56:11Z)
Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking [6.9366619419210656]
Transformers have established themselves as the leading neural network model in natural language processing. Recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers. This paper integrates Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the Transformer block.
arXiv Detail & Related papers (2024-06-18T02:42:19Z)
Mixer is more than just a model [23.309064032922507]
This study focuses on the domain of audio recognition, introducing a novel model named Audio Spectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH) Experimental results demonstrate that ASM-RH is particularly well-suited for audio data and yields promising outcomes across multiple classification tasks.
arXiv Detail & Related papers (2024-02-28T02:45:58Z)
SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks. Much less research has been devoted to the channel mixer or feature mixing block (FFN or) We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z)
TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting [13.410217680999459]
Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. High memory and computing requirements pose a critical bottleneck for long-term forecasting. We propose TSMixer, a lightweight neural architecture composed of multi-layer perceptron (MLP) modules.
arXiv Detail & Related papers (2023-06-14T06:26:23Z)
HyperMixer: An MLP-based Low Cost Alternative to Transformers [12.785548869229052]
We propose a simple variant, HyperMixer, which forms the token mixing dynamically using hypernetworks. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyper tuning.
arXiv Detail & Related papers (2022-03-07T20:23:46Z)
PointMixer: MLP-Mixer for Point Cloud Understanding [74.694733918351]
The concept of channel-mixings and token-mixings achieves noticeable performance in visual recognition tasks. Unlike images, point clouds are inherently sparse, unordered and irregular, which limits the direct use of universal-Mixer for point cloud understanding. We propose PointMixer, a universal point set operator that facilitates information sharing among unstructured 3D points.
arXiv Detail & Related papers (2021-11-22T13:25:54Z)
A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision. Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends. In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z)
MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)
Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. In this paper, we explore how to apply mixup to natural language processing tasks. We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.