Learning the greatest common divisor: explaining transformer predictions
- URL: http://arxiv.org/abs/2308.15594v2
- Date: Thu, 14 Mar 2024 20:47:17 GMT
- Title: Learning the greatest common divisor: explaining transformer predictions
- Authors: François Charton,
- Abstract summary: The predictions of small transformers can be fully characterized by looking at model inputs and outputs.
The model learns a list $mathcal D$ of integers, products of divisors of the base used to represent integers and small primes, and predicts the largest element of $mathcal D$ that divides both inputs.
- Score: 8.430481660019451
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The predictions of small transformers, trained to calculate the greatest common divisor (GCD) of two positive integers, can be fully characterized by looking at model inputs and outputs. As training proceeds, the model learns a list $\mathcal D$ of integers, products of divisors of the base used to represent integers and small primes, and predicts the largest element of $\mathcal D$ that divides both inputs. Training distributions impact performance. Models trained from uniform operands only learn a handful of GCD (up to $38$ GCD $\leq100$). Log-uniform operands boost performance to $73$ GCD $\leq 100$, and a log-uniform distribution of outcomes (i.e. GCD) to $91$. However, training from uniform (balanced) GCD breaks explainability.
Related papers
- Models That Prove Their Own Correctness [2.6570606951261015]
We train Self-Proving models that prove the correctness of their output to a verification algorithm $V$ via an Interactive Proof.
With high probability over a random input, the model generates a correct output *and* successfully proves its correctness to $V!$.
Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.
arXiv Detail & Related papers (2024-05-24T17:10:08Z) - Low-Complexity Integer Divider Architecture for Homomorphic Encryption [5.857929080874288]
Homomorphic encryption (HE) allows computations to be directly carried out on ciphertexts and enables privacy-preserving cloud computing.
An algorithm is proposed to compute the quotient and vigorous mathematical proofs are provided.
arXiv Detail & Related papers (2024-01-19T23:53:59Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Length Generalization in Arithmetic Transformers [41.62455986786115]
We show how transformers cope with two challenges: learning basic integer arithmetic, and generalizing to longer sequences than seen during training.
We propose train set priming: adding a few ($10$ to $50$) long sequences to the training set.
We show that priming allows models trained on $5$-digit $times$ $3$-digit multiplications to generalize to $35times 3$ examples.
arXiv Detail & Related papers (2023-06-27T11:53:25Z) - Few-shot Mining of Naturally Occurring Inputs and Outputs [83.3871936721431]
We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples.
Unlike model-generated data augmentation, our method mines naturally occurring high-quality input output pairs to mimic the style of the seed set for multiple tasks.
On SQuAD-style reading comprehension, augmenting the seed set with the mined data results in an improvement of 13 F1 over a BART-large baseline fine-tuned only on the seed set.
arXiv Detail & Related papers (2022-05-09T05:40:52Z) - Learning Division with Neural Arithmetic Logic Modules [2.019622939313173]
We show that robustly learning division in a systematic manner remains a challenge even at the simplest level of dividing two numbers.
We propose two novel approaches for division which we call the Neural Reciprocal Unit (NRU) and the Neural Multiplicative Reciprocal Unit (NMRU)
arXiv Detail & Related papers (2021-10-11T11:56:57Z) - Under-bagging Nearest Neighbors for Imbalanced Classification [63.026765294759876]
We propose an ensemble learning algorithm called textitunder-bagging $k$-NN (textitunder-bagging $k$-NN) for imbalanced classification problems.
arXiv Detail & Related papers (2021-09-01T14:10:38Z) - Learning elliptic partial differential equations with randomized linear
algebra [2.538209532048867]
We show that one can construct an approximant to $G$ that converges almost surely.
The quantity $0Gamma_epsilonleq 1$ characterizes the quality of the training dataset.
arXiv Detail & Related papers (2021-01-31T16:57:59Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - On the Theory of Transfer Learning: The Importance of Task Diversity [114.656572506859]
We consider $t+1$ tasks parameterized by functions of the form $f_j circ h$ in a general function class $mathcalF circ mathcalH$.
We show that for diverse training tasks the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(mathcalH) + t C(mathcalF)$.
arXiv Detail & Related papers (2020-06-20T20:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.