Arithmetic in Transformers Explained
- URL: http://arxiv.org/abs/2402.02619v9
- Date: Fri, 14 Feb 2025 04:43:31 GMT
- Title: Arithmetic in Transformers Explained
- Authors: Philip Quirke, Clement Neo, Fazl Barez,
- Abstract summary: We analyze 44 autoregressive transformer models trained on addition, subtraction, or both.<n>We show that the addition models converge on a common logical algorithm, with most models achieving >99.999% prediction accuracy.<n>We introduce a reusable library of mechanistic interpretability tools to define, locate, and visualize these algorithmic circuits.
- Score: 1.8434042562191815
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: While recent work has shown transformers can learn addition, previous models exhibit poor prediction accuracy and are limited to small numbers. Furthermore, the relationship between single-task and multitask arithmetic capabilities remains unexplored. In this work, we analyze 44 autoregressive transformer models trained on addition, subtraction, or both. These include 16 addition-only models, 2 subtraction-only models, 8 "mixed" models trained to perform addition and subtraction, and 14 mixed models initialized with parameters from an addition-only model. The models span 5- to 15-digit questions, 2 to 4 attention heads, and 2 to 3 layers. We show that the addition models converge on a common logical algorithm, with most models achieving >99.999% prediction accuracy. We provide a detailed mechanistic explanation of how this algorithm is implemented within the network architecture. Subtraction-only models have lower accuracy. With the initialized mixed models, through parameter transfer experiments, we explore how multitask learning dynamics evolve, revealing that some features originally specialized for addition become polysemantic, serving both operations, and boosting subtraction accuracy. We explain the mixed algorithm mechanically. Finally, we introduce a reusable library of mechanistic interpretability tools to define, locate, and visualize these algorithmic circuits across multiple models.
Related papers
- Towards a unified and verified understanding of group-operation networks [0.8305049591788082]
We investigate the internals of one-hidden-layer neural networks trained on the binary operation of finite groups.
We produce a more complete description of such models in a step towards unifying the explanations of previous works.
arXiv Detail & Related papers (2024-10-09T23:02:00Z) - EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - Code Pretraining Improves Entity Tracking Abilities of Language Models [20.6768931196215]
We find clear evidence that models additionally trained on large amounts of code outperform the base models.
On the other hand, we find no consistent benefit of additional math training or alignment tuning across various model families.
arXiv Detail & Related papers (2024-05-31T17:56:33Z) - Understanding Addition in Transformers [2.07180164747172]
This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer addition.
Our findings suggest that the model dissects the task into parallel streams dedicated to individual digits, employing varied algorithms tailored to different positions within the digits.
arXiv Detail & Related papers (2023-10-19T19:34:42Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Understanding Parameter Sharing in Transformers [53.75988363281843]
Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth.
We show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity.
Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.
arXiv Detail & Related papers (2023-06-15T10:48:59Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - Interpretable models for extrapolation in scientific machine learning [0.0]
Complex machine learning algorithms often outperform simple regressions in interpolative settings.
We examine the trade-off between model performance and interpretability across a broad range of science and engineering problems.
arXiv Detail & Related papers (2022-12-16T19:33:28Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Generalization Analysis on Learning with a Concurrent Verifier [16.298786827265673]
We analyze how the learnability of a machine learning model changes with a CV.
We show that typical error bounds based on Rademacher complexity will be no larger than that of the original model.
arXiv Detail & Related papers (2022-10-11T10:51:55Z) - Inter-model Interpretability: Self-supervised Models as a Case Study [0.2578242050187029]
We build on a recent interpretability technique called Dissect to introduce textitinter-model interpretability
We project 13 top-performing self-supervised models into a Learned Concepts Embedding space that reveals proximities among models from the perspective of learned concepts.
The experiment allowed us to categorize the models into three categories and revealed for the first time the type of visual concepts different tasks requires.
arXiv Detail & Related papers (2022-07-24T22:50:18Z) - Improving Rare Word Recognition with LM-aware MWER Training [50.241159623691885]
We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework.
For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement.
For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
arXiv Detail & Related papers (2022-04-15T17:19:41Z) - QuantifyML: How Good is my Machine Learning Model? [0.0]
QuantifyML aims to quantify the extent to which machine learning models have learned and generalized from the given data.
The formula is analyzed with off-the-shelf model counters to obtain precise counts with respect to different model behavior.
arXiv Detail & Related papers (2021-10-25T01:56:01Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.