Related papers: Arbitrary-Length Generalization for Addition in a Tiny Transformer

Arbitrary-Length Generalization for Addition in a Tiny Transformer

URL: http://arxiv.org/abs/2406.00075v2
Date: Wed, 12 Jun 2024 03:40:35 GMT
Title: Arbitrary-Length Generalization for Addition in a Tiny Transformer
Authors: Alexandre Galvao Patriota,
Abstract summary: This paper introduces a novel training methodology that enables a Transformer model to generalize the addition of two-digit numbers to numbers with unseen lengths of digits. The proposed approach employs an autoregressive generation technique, processing from right to left, which mimics a common manual method for adding large numbers.
Score: 55.2480439325792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a novel training methodology that enables a Transformer model to generalize the addition of two-digit numbers to numbers with unseen lengths of digits. The proposed approach employs an autoregressive generation technique, processing from right to left, which mimics a common manual method for adding large numbers. To the best of my knowledge, this methodology has not been previously explored in the literature. All results are reproducible, and the corresponding R code is available at github.com/AGPatriota/ALGA-R/.

Related papers

How to Leverage Digit Embeddings to Represent Numbers? [13.880400817682059]
Generalisations, such as solving 100+200 instead of 1+2, can substantially affect model performance. Character-level embeddings of numbers have emerged as a promising approach to improve number representation. We use mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models.
arXiv Detail & Related papers (2024-07-01T01:31:41Z)
Reverse That Number! Decoding Order Matters in Arithmetic Learning [49.5504492920404]
Our work introduces a novel strategy that reevaluates the digit order by prioritizing output from the least significant digit. Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training.
arXiv Detail & Related papers (2024-03-09T09:04:53Z)
GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models. We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network. We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z)
HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization. Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z)
Long-Span Dependencies in Transformer-based Summarization Systems [38.672160430296536]
Transformer-based models have achieved state-of-the-art results in a wide range of natural language processing (NLP) tasks including document summarization. One issue with these transformer-based models is that they do not scale well in terms of memory and compute requirements as the input length grows. In this work, we exploit large pre-trained transformer-based models and address long-span dependencies in abstractive summarization.
arXiv Detail & Related papers (2021-05-08T23:53:03Z)
Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer [2.952085248753861]
A transformer-decoder decoder is employed to replace RNN-based ones. Experiments demonstrate that our model improves the ExpRate of current state-of-the-art methods on CROHME 2014 by 2.23%.
arXiv Detail & Related papers (2021-05-06T03:11:54Z)
LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring [55.16665077221941]
We propose a novel rescoring approach, which processes the entire lattice in a single call to the model. The key feature of our rescoring policy is a novel non-autoregressive Lattice Transformer Language Model (LT-LM)
arXiv Detail & Related papers (2021-04-06T14:06:07Z)
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size [41.624797099537375]
We present a novel method for applying pretrained transformer language models. We find that our method attains better perplexity than an unmodified GPT-2 model on the PG-19 and WikiText-103 corpora.
arXiv Detail & Related papers (2020-08-16T23:19:30Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.