Mixer is more than just a model
- URL: http://arxiv.org/abs/2402.18007v2
- Date: Sat, 2 Mar 2024 03:32:40 GMT
- Title: Mixer is more than just a model
- Authors: Qingfeng Ji, Yuxin Wang, Letong Sun
- Abstract summary: This study focuses on the domain of audio recognition, introducing a novel model named Audio Spectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH)
Experimental results demonstrate that ASM-RH is particularly well-suited for audio data and yields promising outcomes across multiple classification tasks.
- Score: 23.309064032922507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, MLP structures have regained popularity, with MLP-Mixer standing
out as a prominent example. In the field of computer vision, MLP-Mixer is noted
for its ability to extract data information from both channel and token
perspectives, effectively acting as a fusion of channel and token information.
Indeed, Mixer represents a paradigm for information extraction that amalgamates
channel and token information. The essence of Mixer lies in its ability to
blend information from diverse perspectives, epitomizing the true concept of
"mixing" in the realm of neural network architectures. Beyond channel and token
considerations, it is possible to create more tailored mixers from various
perspectives to better suit specific task requirements. This study focuses on
the domain of audio recognition, introducing a novel model named Audio
Spectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH) that incorporates
insights from both time and frequency domains. Experimental results demonstrate
that ASM-RH is particularly well-suited for audio data and yields promising
outcomes across multiple classification tasks. The models and optimal weights
files will be published.
Related papers
- EMOFM: Ensemble MLP mOdel with Feature-based Mixers for Click-Through
Rate Prediction [5.983194751474721]
A dataset contains millions of records and each field-wise feature in a record consists of hashed integers for privacy.
For this task, the keys of network-based methods might be type-wise feature extraction and information fusion across different fields.
We propose plug-in mixers for field/type-wise feature fusion, thus construct an field&type-wise ensemble model, namely EMOFM.
arXiv Detail & Related papers (2023-10-06T12:32:23Z) - iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer [2.5782420501870296]
We generalize studies on Hopfield networks and Transformer-like architecture to iMixer.
iMixer is a generalization that propagates forward from the output side to the input side.
We evaluate the model performance with various datasets on image classification tasks.
The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.
arXiv Detail & Related papers (2023-04-25T18:00:08Z) - Zorro: the masked multimodal transformer [68.99684436029884]
Zorro is a technique that uses masks to control how inputs from each modality are routed inside Transformers.
We show that with contrastive pre-training Zorro achieves state-of-the-art results on most relevant benchmarks for multimodal tasks.
arXiv Detail & Related papers (2023-01-23T17:51:39Z) - SplitMixer: Fat Trimmed From MLP-like Models [53.12472550578278]
We present SplitMixer, a simple and lightweight isotropic-like architecture, for visual recognition.
It contains two types of interleaving convolutional operations to mix information across locations (spatial mixing) and channels (channel mixing)
arXiv Detail & Related papers (2022-07-21T01:37:07Z) - ActiveMLP: An MLP-like Architecture with Active Token Mixer [54.95923719553343]
This paper presents ActiveMLP, a general-like backbone for computer vision.
We propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate contextual information from other tokens in the global scope into the given one.
In this way, the spatial range of token-mixing is expanded and the way of token-mixing is reformed.
arXiv Detail & Related papers (2022-03-11T17:29:54Z) - DynaMixer: A Vision MLP Architecture with Dynamic Mixing [38.23027495545522]
This paper presents an efficient tasks-like network architecture, dubbed DynaMixer, resorting to dynamic information fusion.
We propose a procedure, on which the DynaMixer model relies, to dynamically generate mixing by leveraging the contents of all the tokens to be mixed.
Our proposed DynaMixer model (97M parameters) achieves 84.3% top-1 accuracy on the ImageNet-1K, performing favorably against the state-of-the-art vision models.
arXiv Detail & Related papers (2022-01-28T12:43:14Z) - PointMixer: MLP-Mixer for Point Cloud Understanding [74.694733918351]
The concept of channel-mixings and token-mixings achieves noticeable performance in visual recognition tasks.
Unlike images, point clouds are inherently sparse, unordered and irregular, which limits the direct use of universal-Mixer for point cloud understanding.
We propose PointMixer, a universal point set operator that facilitates information sharing among unstructured 3D points.
arXiv Detail & Related papers (2021-11-22T13:25:54Z) - MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs).
Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z) - MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks [97.08677678499075]
We introduce MixMo, a new framework for learning multi-input multi-output deepworks.
We show that binary mixing in features - particularly with patches from CutMix - enhances results by makingworks stronger and more diverse.
In addition to being easy to implement and adding no cost at inference, our models outperform much costlier data augmented deep ensembles.
arXiv Detail & Related papers (2021-03-10T15:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.