How to Leverage Digit Embeddings to Represent Numbers?
- URL: http://arxiv.org/abs/2407.00894v1
- Date: Mon, 1 Jul 2024 01:31:41 GMT
- Title: How to Leverage Digit Embeddings to Represent Numbers?
- Authors: Jasivan Alex Sivakumar, Nafise Sadat Moosavi,
- Abstract summary: Generalisations, such as solving 100+200 instead of 1+2, can substantially affect model performance.
Character-level embeddings of numbers have emerged as a promising approach to improve number representation.
We use mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models.
- Score: 13.880400817682059
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Apart from performing arithmetic operations, understanding numbers themselves is still a challenge for existing language models. Simple generalisations, such as solving 100+200 instead of 1+2, can substantially affect model performance (Sivakumar and Moosavi, 2023). Among various techniques, character-level embeddings of numbers have emerged as a promising approach to improve number representation. However, this method has limitations as it leaves the task of aggregating digit representations to the model, which lacks direct supervision for this process. In this paper, we explore the use of mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models. This can be achieved either by adding a special token to the input embeddings or by introducing an additional loss function to enhance correct predictions. We evaluate the effectiveness of incorporating this explicit aggregation, analysing its strengths and shortcomings, and discuss future directions to better benefit from this approach. Our methods, while simple, are compatible with any pretrained model and require only a few lines of code, which we have made publicly available.
Related papers
- Interleaving Text and Number Embeddings to Solve Mathemathics Problems [0.0]
We build upon a recent approach by introducing more expressive numerical embeddings.
Our method addresses key shortcomings, including the elimination of numerical artefacts and the ability to handle a wide range of magnitudes without clipping.
arXiv Detail & Related papers (2024-10-25T07:21:57Z) - A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - Reverse That Number! Decoding Order Matters in Arithmetic Learning [49.5504492920404]
Our work introduces a novel strategy that reevaluates the digit order by prioritizing output from the least significant digit.
Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training.
arXiv Detail & Related papers (2024-03-09T09:04:53Z) - Estimating Numbers without Regression [30.79061214333164]
Despite recent successes in language models, their ability to represent numbers is insufficient.
Subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks.
We show that changing the model's vocabulary instead (eg introduce a new token for numbers in range 10-100) is a far better trade-off.
arXiv Detail & Related papers (2023-10-09T23:07:05Z) - Induced Natural Language Rationales and Interleaved Markup Tokens Enable
Extrapolation in Large Language Models [8.166629393064097]
The ability to extrapolate, i.e., to make predictions on sequences that are longer than those presented as training examples, is a challenging problem for deep learning models.
Recent work shows that this limitation persists in state-of-the-art Transformer-based models.
We demonstrate that large language models can succeed in extrapolation without modifying their architecture or training procedure.
arXiv Detail & Related papers (2022-08-24T11:25:27Z) - Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications.
Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture.
We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Teaching Autoregressive Language Models Complex Tasks By Demonstration [0.0]
It is possible to teach an autoregressive language model (GPT-Neo) to execute a mathematical task with a relatively small number of examples.
We show that after fine-tuning on 200 appropriately structured demonstrations of solving long division problems and reporting the remainders, the smallest available GPT-Neo model achieves over 80% accuracy.
arXiv Detail & Related papers (2021-09-05T15:25:28Z) - Investigating the Limitations of the Transformers with Simple Arithmetic
Tasks [10.23804850480924]
We find that how a number is represented in its surface form has a strong influence on the model's accuracy.
We conclude that modern pretrained language models can easily learn arithmetic from very few examples.
arXiv Detail & Related papers (2021-02-25T17:22:53Z) - Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples.
In this work we investigate few-shot learning in the setting where the data points are sequences of tokens.
We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.