ALTA: Compiler-Based Analysis of Transformers
- URL: http://arxiv.org/abs/2410.18077v1
- Date: Wed, 23 Oct 2024 17:58:49 GMT
- Title: ALTA: Compiler-Based Analysis of Transformers
- Authors: Peter Shaw, James Cohan, Jacob Eisenstein, Kenton Lee, Jonathan Berant, Kristina Toutanova,
- Abstract summary: We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights.
ALTA is inspired by RASP, a language proposed by Weiss et al.
We show how Transformers can represent length-invariant algorithms for computing parity and addition, as well as a solution to the SCAN benchmark of compositional generalization tasks.
- Score: 56.76482035060707
- License:
- Abstract: We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights. ALTA is inspired by RASP, a language proposed by Weiss et al. (2021), and Tracr (Lindner et al., 2023), a compiler from RASP programs to Transformer weights. ALTA complements and extends this prior work, offering the ability to express loops and to compile programs to Universal Transformers, among other advantages. ALTA allows us to constructively show how Transformers can represent length-invariant algorithms for computing parity and addition, as well as a solution to the SCAN benchmark of compositional generalization tasks, without requiring intermediate scratchpad decoding steps. We also propose tools to analyze cases where the expressibility of an algorithm is established, but end-to-end training on a given training set fails to induce behavior consistent with the desired algorithm. To this end, we explore training from ALTA execution traces as a more fine-grained supervision signal. This enables additional experiments and theoretical analyses relating the learnability of various algorithms to data availability and modeling decisions, such as positional encodings. We make the ALTA framework -- language specification, symbolic interpreter, and weight compiler -- available to the community to enable further applications and insights.
Related papers
- Transformers are Efficient Compilers, Provably [11.459397066286822]
Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks.
In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective.
We introduce a representative programming language, Mini-Husky, which encapsulates key features of modern C-like languages.
arXiv Detail & Related papers (2024-10-07T20:31:13Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design.
Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization.
The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z) - ProgSG: Cross-Modality Representation Learning for Programs in
Electronic Design Automation [38.023395256208055]
High-level synthesis (HLS) allows a developer to compile a high-level description in the form of software code in C and C++.
HLS tools still require microarchitecture decisions, expressed in terms of pragmas.
We propose ProgSG allowing the source code sequence modality and the graph modalities to interact with each other in a deep and fine-grained way.
arXiv Detail & Related papers (2023-05-18T09:44:18Z) - Planning with Large Language Models for Code Generation [100.07232672883897]
Planning-Guided Transformer Decoding (PG-TD) uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs.
We empirically evaluate our framework with several large language models as backbones on public coding challenge benchmarks.
arXiv Detail & Related papers (2023-03-09T18:59:47Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages.
We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data.
Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.