Related papers: AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures

AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures

URL: http://arxiv.org/abs/2402.13572v2
Date: Fri, 10 Jan 2025 09:11:39 GMT
Title: AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures
Authors: Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael K. Ng, Zhenguo Li, Zhaoqiang Liu,
Abstract summary: We design a novel transformer framework, dubbed AlgoFormer, to empower transformers with algorithmic capabilities.<n>In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing.<n>Some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning.
Score: 80.28359222380733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by the recently proposed looped transformer, we design a novel transformer framework, dubbed Algorithm Transformer (abbreviated as AlgoFormer). We provide an insight that efficient transformer architectures can be designed by leveraging prior knowledge of tasks and the underlying structure of potential algorithms. Compared with the standard transformer and vanilla looped transformer, the proposed AlgoFormer can perform efficiently in algorithm representation in some specific tasks. In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing, a looped transformer for iterative optimization algorithms, and a post-transformer for producing the desired results after post-processing. We provide theoretical evidence of the expressive power of the AlgoFormer in solving some challenging problems, mirroring human-designed algorithms. Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning. Experimental results demonstrate the empirical superiority of the proposed transformer in that it outperforms the standard transformer and vanilla looped transformer in some specific tasks. An extensive experiment on real language tasks (e.g., neural machine translation of German and English, and text classification) further validates the expressiveness and effectiveness of AlgoFormer.

Related papers

Discovering Interpretable Algorithms by Decompiling Transformers to RASP [58.057999683173655]
Recent work has shown that the computations of Transformers can be simulated in the RASP family of programming languages.<n>We present a general method to extract simple interpretable programs from trained Transformers.<n>Our results provide the most direct evidence so far that Transformers internally implement simple RASP programs.
arXiv Detail & Related papers (2026-02-09T16:22:29Z)
Transformers Meet In-Context Learning: A Universal Approximation Theory [25.109347875620436]
We develop a universal approximation theory to better understand how transformers enable in-context learning.<n>Our work sheds light on how transformers can simultaneously learn general-purpose representations and adapt dynamically to in-context examples.
arXiv Detail & Related papers (2025-06-05T16:12:51Z)
Enhancing Transformers for Generalizable First-Order Logical Entailment [51.04944136538266]
This paper investigates the generalizable first-order logical reasoning ability of transformers with their parameterized knowledge. The first-order reasoning capability of transformers is assessed through their ability to perform first-order logical entailment. We propose a more sophisticated, logic-aware architecture, TEGA, to enhance the capability for generalizable first-order logical entailment in transformers.
arXiv Detail & Related papers (2025-01-01T07:05:32Z)
Transformers are Expressive, But Are They Expressive Enough for Regression? [38.369337945109855]
We show that Transformers struggle to reliably approximate smooth functions, relying on piecewise constant approximations with sizable intervals. By shedding light on these challenges, we advocate a refined understanding of Transformers' capabilities.
arXiv Detail & Related papers (2024-02-23T18:12:53Z)
Looped Transformers are Better at Learning Learning Algorithms [16.98720552888865]
We propose the utilization of looped transformer architecture and its associated training methodology. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer.
arXiv Detail & Related papers (2023-11-21T08:32:38Z)
Linear attention is (maybe) all you need (to understand transformer optimization) [55.81555204646486]
We make progress towards understanding the subtleties of training Transformers by studying a simple yet canonicalized shallow Transformer model. Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of Transformer training dynamics.
arXiv Detail & Related papers (2023-10-02T10:48:42Z)
A Survey of Techniques for Optimizing Transformer Inference [3.6258657276072253]
Recent years have seen a phenomenal rise in performance and applications of transformer neural networks. Transformer-based networks such as ChatGPT have impacted the lives of common men. Researchers have proposed techniques to optimize transformer inference at all levels of abstraction.
arXiv Detail & Related papers (2023-07-16T08:50:50Z)
Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design. Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z)
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input [50.83356836818667]
We study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs. Our theoretical results support the practical success of Transformers for high dimensional data.
arXiv Detail & Related papers (2023-05-30T02:44:49Z)
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks [6.525090891505941]
We show how a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions. We show that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition. These results provide key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi-level policies.
arXiv Detail & Related papers (2022-10-02T00:46:36Z)
A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities [54.039855851891815]
Transformer-based models have demonstrated state-of-the-art performance in many intelligent coding tasks.<n>We empirically study the effect of semantic-preserving code transformation on the performance of Transformer.
arXiv Detail & Related papers (2022-07-09T15:02:39Z)
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer. We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets. Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z)
Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language. We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer. We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)
Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters. A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.