Related papers: MLPs Learn In-Context on Regression and Classification Tasks

MLPs Learn In-Context on Regression and Classification Tasks

URL: http://arxiv.org/abs/2405.15618v2
Date: Thu, 26 Sep 2024 16:05:30 GMT
Title: MLPs Learn In-Context on Regression and Classification Tasks
Authors: William L. Tong, Cengiz Pehlevan,
Abstract summary: In-context learning (ICL) is often assumed to be a unique hallmark of Transformer models. We demonstrate that multi-layer perceptrons (MLPs) can also learn in-context.
Score: 28.13046236900491
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, is often assumed to be a unique hallmark of Transformer models. By examining commonly employed synthetic ICL tasks, we demonstrate that multi-layer perceptrons (MLPs) can also learn in-context. Moreover, MLPs, and the closely related MLP-Mixer models, learn in-context competitively with Transformers given the same compute budget in this setting. We further show that MLPs outperform Transformers on a series of classical tasks from psychology designed to test relational reasoning, which are closely related to in-context classification. These results underscore a need for studying in-context learning beyond attention-based architectures, while also challenging strong prior arguments about MLPs' limited ability to solve relational tasks. Altogether, our results highlight the unexpected competence of MLPs, and support the growing interest in all-MLP alternatives to task-specific architectures.

Related papers

KAN or MLP: A Fairer Comparison [63.794304207664176]
This paper offers a fairer and more comprehensive comparison of KAN and models across various tasks. We control the number of parameters and FLOPs to compare the performance of KAN and representation. We find that KAN's issue is more severe than that of forgetting in a standard class-incremental continual learning setting.
arXiv Detail & Related papers (2024-07-23T17:43:35Z)
Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion [20.437172251393257]
We propose a new simple but effective architecture called the Lateralization (L-MLP) Inspired by the lateralization of the human brain, we propose a new simple but effective architecture called the Lateralization (L-MLP)
arXiv Detail & Related papers (2024-05-25T07:10:02Z)
MLPs Compass: What is learned when MLPs are combined with PLMs? [20.003022732050994]
Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outperforming Graph Neural Networks (GNNs) This paper aims to quantify whether simples can further enhance the already potent ability of PLMs to capture linguistic information.
arXiv Detail & Related papers (2024-01-03T11:06:01Z)
SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks. Much less research has been devoted to the channel mixer or feature mixing block (FFN or) We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z)
MLP-AIR: An Efficient MLP-Based Method for Actor Interaction Relation Learning in Group Activity Recognition [4.24515544235173]
Group Activity Recognition (GAR) aims to predict the activity category of the group by learning the actor-temporal interaction relation in the group. Previous works mainly learn the interaction relation by the well-designed GCNs or Transformers. In this paper, we design a novel-based method for Actor Interaction Relation learning (MLP-AIR) in GAR.
arXiv Detail & Related papers (2023-04-18T08:07:23Z)
Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks. We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens) We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z)
MLP Architectures for Vision-and-Language Modeling: An Empirical Study [91.6393550858739]
We initiate the first empirical study on the use of architectures for vision-and-featured (VL) fusion. We find that without pre-training, usings for multimodal fusion has a noticeable performance gap compared to transformers. Instead of heavy multi-head attention, adding tiny one-head attention to encoders is sufficient to achieve comparable performance to transformers.
arXiv Detail & Related papers (2021-12-08T18:26:19Z)
Hire-MLP: Vision MLP via Hierarchical Rearrangement [58.33383667626998]
Hire-MLP is a simple yet competitive vision architecture via rearrangement. The proposed Hire-MLP architecture is built with simple channel-mixing operations, thus enjoys high flexibility and inference speed. Experiments show that our Hire-MLP achieves state-of-the-art performance on the ImageNet-1K benchmark.
arXiv Detail & Related papers (2021-08-30T16:11:04Z)
AS-MLP: An Axial Shifted MLP Architecture for Vision [50.11765148947432]
An Axial Shifted architecture (AS-MLP) is proposed in this paper. By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different directions. With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset.
arXiv Detail & Related papers (2021-07-18T08:56:34Z)
Rethinking Token-Mixing MLP for MLP-based Vision Backbone [34.47616917228978]
We propose an improved structure as termed Circulant Channel-Specific (CCS) token-mixing benchmark, which is spatial-invariant and channel-specific. It takes fewer parameters but achieves higher classification accuracy on ImageNet1K.
arXiv Detail & Related papers (2021-06-28T17:59:57Z)
Pay Attention to MLPs [84.54729425918164]
We show that gMLP can perform as well as Transformers in key language and applications. Our comparisons show that self-attention is not critical for Vision Transformers, as gMLP can achieve the same accuracy. In general, our experiments show that gMLP can scale as well as Transformers over increased data and compute.
arXiv Detail & Related papers (2021-05-17T17:55:04Z)
MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.