Related papers: Looped Transformers are Better at Learning Learning Algorithms

Looped Transformers are Better at Learning Learning Algorithms

URL: http://arxiv.org/abs/2311.12424v3
Date: Sat, 16 Mar 2024 21:10:38 GMT
Title: Looped Transformers are Better at Learning Learning Algorithms
Authors: Liu Yang, Kangwook Lee, Robert Nowak, Dimitris Papailiopoulos,
Abstract summary: We propose the utilization of looped transformer architecture and its associated training methodology. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer.
Score: 16.98720552888865
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utilization of looped transformer architecture and its associated training methodology, with the aim of incorporating iterative characteristics into the transformer architectures. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer in solving various data-fitting problems, while utilizing less than 10% of the parameter count.

Related papers

RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals [2.287772422489548]
We propose RingFormer, which employs one Transformer layer that processes input repeatedly in a circular, ring-like manner. This allows us to reduce the model parameters substantially while maintaining high performance in a variety of tasks such as translation and image classification.
arXiv Detail & Related papers (2025-02-18T09:34:31Z)
Learning Spectral Methods by Transformers [18.869174453242383]
We show that multi-layered Transformers, given a sufficiently large set of pre-training instances, are able to learn the algorithms themselves. This learning paradigm is distinct from the in-context learning setup and is similar to the learning procedure of human brains.
arXiv Detail & Related papers (2025-01-02T15:53:25Z)
A Comparative Study on Code Generation with Transformers [0.0]
This paper introduces the concept of a "A Comparative Study on Code Generation with Transformers" A model based on Transformer architecture, and NLP methodologies to automatically generate C++ source code for different varieties of problems.
arXiv Detail & Related papers (2024-12-07T21:18:23Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
Linear Transformers are Versatile In-Context Learners [19.988368693379087]
We prove that each layer of a linear transformer maintains a weight vector for an implicit linear regression problem. We also investigate the use of linear transformers in a challenging scenario where the training data is corrupted with different levels of noise. Remarkably, we demonstrate that for this problem linear transformers discover an intricate and highly effective optimization algorithm.
arXiv Detail & Related papers (2024-02-21T23:45:57Z)
On the Expressive Power of a Variant of the Looped Transformer [83.30272757948829]
We design a novel transformer block, dubbed AlgoFormer, to empower transformers with algorithmic capabilities. The proposed AlgoFormer can achieve significantly higher in algorithm representation when using the same number of parameters. Some theoretical and empirical results are presented to show that the designed transformer has the potential to be smarter than human-designed algorithms.
arXiv Detail & Related papers (2024-02-21T07:07:54Z)
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling [1.0128808054306186]
We propose a novel sparsification scheme for the Transformer that integrates convolution filters and the flood filling method. Our sparsification approach reduces the computational complexity and memory footprint of the Transformer during training. New SPION achieves up to 3.08X speedup over existing state-of-the-art sparse Transformer models.
arXiv Detail & Related papers (2023-09-22T02:14:46Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design. Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.