Comparing Generalization in Learning with Limited Numbers of Exemplars:
Transformer vs. RNN in Attractor Dynamics
- URL: http://arxiv.org/abs/2311.10763v1
- Date: Wed, 15 Nov 2023 00:37:49 GMT
- Title: Comparing Generalization in Learning with Limited Numbers of Exemplars:
Transformer vs. RNN in Attractor Dynamics
- Authors: Rui Fukushima and Jun Tani
- Abstract summary: ChatGPT, a widely-recognized large language model (LLM), has recently gained substantial attention for its performance scaling.
This raises a crucial question about Transformer's generalization-in-learning (GIL) capacity.
We compare Transformer's GIL capabilities with those of a traditional Recurrent Neural Network (RNN) in tasks involving attractor dynamics learning.
- Score: 3.5353632767823497
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ChatGPT, a widely-recognized large language model (LLM), has recently gained
substantial attention for its performance scaling, attributed to the billions
of web-sourced natural language sentences used for training. Its underlying
architecture, Transformer, has found applications across diverse fields,
including video, audio signals, and robotic movement. %The crucial question
this raises concerns the Transformer's generalization-in-learning (GIL)
capacity. However, this raises a crucial question about Transformer's
generalization in learning (GIL) capacity. Is ChatGPT's success chiefly due to
the vast dataset used for training, or is there more to the story? To
investigate this, we compared Transformer's GIL capabilities with those of a
traditional Recurrent Neural Network (RNN) in tasks involving attractor
dynamics learning. For performance evaluation, the Dynamic Time Warping (DTW)
method has been employed. Our simulation results suggest that under conditions
of limited data availability, Transformer's GIL abilities are markedly inferior
to those of RNN.
Related papers
- Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent [26.764893400499354]
We show that linear looped Transformers can implement multi-step gradient descent efficiently for in-context learning.
Our results demonstrate that as long as the input data has a constant condition number, $n = O(d)$, the linear looped Transformers can achieve a small error.
arXiv Detail & Related papers (2024-10-15T04:44:23Z) - Transformer Explainer: Interactive Learning of Text-Generative Models [65.91049787390692]
Transformer Explainer is an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model.
It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together.
arXiv Detail & Related papers (2024-08-08T17:49:07Z) - GTC: GNN-Transformer Co-contrastive Learning for Self-supervised Heterogeneous Graph Representation [0.9249657468385781]
This paper proposes a collaborative learning scheme for GNN-Transformer and constructs GTC architecture.
For the Transformer branch, we propose Metapath-aware Hop2Token and CG-Hetphormer, which can cooperate with GNN to attentively encode neighborhood information from different levels.
Experiments on real datasets show that GTC exhibits superior performance compared with state-of-the-art methods.
arXiv Detail & Related papers (2024-03-22T12:22:44Z) - Linear Transformers with Learnable Kernel Functions are Better In-Context Models [3.3865605512957453]
We present an elegant alteration to the Based kernel that amplifies its In-Context Learning abilities.
In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task.
arXiv Detail & Related papers (2024-02-16T12:44:15Z) - Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL)
This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - The Closeness of In-Context Learning and Weight Shifting for Softmax
Regression [42.95984289657388]
We study the in-context learning based on a softmax regression formulation.
We show that when training self-attention-only Transformers for fundamental regression tasks, the models learned by gradient-descent and Transformers show great similarity.
arXiv Detail & Related papers (2023-04-26T04:33:41Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Shifted Chunk Transformer for Spatio-Temporal Representational Learning [24.361059477031162]
We construct a shifted chunk Transformer with pure self-attention blocks.
This Transformer can learn hierarchical-temporal features from a tiny patch to a global video clip.
It outperforms state-of-the-art approaches on Kinetics, Kinetics-600, UCF101, and HMDB51.
arXiv Detail & Related papers (2021-08-26T04:34:33Z) - Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting.
We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains.
The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.