Estimating Numbers without Regression
- URL: http://arxiv.org/abs/2310.06204v1
- Date: Mon, 9 Oct 2023 23:07:05 GMT
- Title: Estimating Numbers without Regression
- Authors: Avijit Thawani and Jay Pujara and Ashwin Kalyan
- Abstract summary: Despite recent successes in language models, their ability to represent numbers is insufficient.
Subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks.
We show that changing the model's vocabulary instead (eg introduce a new token for numbers in range 10-100) is a far better trade-off.
- Score: 30.79061214333164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent successes in language models, their ability to represent
numbers is insufficient. Humans conceptualize numbers based on their
magnitudes, effectively projecting them on a number line; whereas subword
tokenization fails to explicitly capture magnitude by splitting numbers into
arbitrary chunks. To alleviate this shortcoming, alternative approaches have
been proposed that modify numbers at various stages of the language modeling
pipeline. These methods change either the (1) notation in which numbers are
written (\eg scientific vs decimal), the (2) vocabulary used to represent
numbers or the entire (3) architecture of the underlying language model, to
directly regress to a desired number.
Previous work suggests that architectural change helps achieve
state-of-the-art on number estimation but we find an insightful ablation:
changing the model's vocabulary instead (\eg introduce a new token for numbers
in range 10-100) is a far better trade-off. In the context of masked number
prediction, a carefully designed tokenization scheme is both the simplest to
implement and sufficient, \ie with similar performance to the state-of-the-art
approach that requires making significant architectural changes. Finally, we
report similar trends on the downstream task of numerical fact estimation (for
Fermi Problems) and discuss reasons behind our findings.
Related papers
- Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models [2.5346260093097017]
We present two versions of a number token loss for language models.
The first is based on an $L_p$ loss between the ground truth token value and the weighted sum of the predicted class probabilities.
The second loss minimizes the Wasserstein-1 distance between the distribution of the predicted output probabilities and the ground truth distribution.
arXiv Detail & Related papers (2024-11-04T13:43:24Z) - How to Leverage Digit Embeddings to Represent Numbers? [13.880400817682059]
Generalisations, such as solving 100+200 instead of 1+2, can substantially affect model performance.
Character-level embeddings of numbers have emerged as a promising approach to improve number representation.
We use mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models.
arXiv Detail & Related papers (2024-07-01T01:31:41Z) - Laying Anchors: Semantically Priming Numerals in Language Modeling [11.831883526217942]
We introduce strategies to semantically prime numerals in any corpus by generating anchors governed by the distribution of numerals in said corpus.
We demonstrate significant improvements in the mathematical grounding of our learned embeddings.
arXiv Detail & Related papers (2024-04-02T00:02:00Z) - NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning [27.584258258635945]
Language models struggle with handling numerical data and performing arithmetic operations.
We propose a simple adjustment to how numbers are represented by including the count of digits before each number.
By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number.
arXiv Detail & Related papers (2024-03-30T19:46:59Z) - xVal: A Continuous Number Encoding for Large Language Models [42.19323262199993]
We propose xVal, a numerical encoding scheme that represents any real number using just a single token.
We empirically evaluate our proposal on a number of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-10-04T17:26:16Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Quark: Controllable Text Generation with Reinforced Unlearning [68.07749519374089]
Large-scale language models often learn behaviors that are misaligned with user expectations.
We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property.
For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
arXiv Detail & Related papers (2022-05-26T21:11:51Z) - Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained
Language Models [67.48894919842576]
State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require numeracy.
We propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step.
Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy.
arXiv Detail & Related papers (2022-05-13T16:10:13Z) - Numerical reasoning in machine reading comprehension tasks: are we there
yet? [79.07883990966077]
Numerical reasoning based machine reading comprehension is a task that involves reading comprehension along with using arithmetic operations such as addition, subtraction, sorting, and counting.
The DROP benchmark is a recent dataset that has inspired the design of NLP models aimed at solving this task.
The current standings of these models in the DROP leaderboard, over standard metrics, suggest that the models have achieved near-human performance.
arXiv Detail & Related papers (2021-09-16T20:13:56Z) - NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.931394234642816]
We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts.
Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number.
A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
arXiv Detail & Related papers (2021-09-07T15:06:12Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.