NumGPT: Improving Numeracy Ability of Generative Pre-trained Models
- URL: http://arxiv.org/abs/2109.03137v1
- Date: Tue, 7 Sep 2021 15:06:12 GMT
- Title: NumGPT: Improving Numeracy Ability of Generative Pre-trained Models
- Authors: Zhihua Jin, Xin Jiang, Xingbo Wang, Qun Liu, Yong Wang, Xiaozhe Ren,
Huamin Qu
- Abstract summary: We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts.
Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number.
A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
- Score: 59.931394234642816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing generative pre-trained language models (e.g., GPT) focus on modeling
the language structure and semantics of general texts. However, those models do
not consider the numerical properties of numbers and cannot perform robustly on
numerical reasoning tasks (e.g., math word problems and measurement
estimation). In this paper, we propose NumGPT, a generative pre-trained model
that explicitly models the numerical properties of numbers in texts.
Specifically, it leverages a prototype-based numeral embedding to encode the
mantissa of the number and an individual embedding to encode the exponent of
the number. A numeral-aware loss function is designed to integrate numerals
into the pre-training objective of NumGPT. We conduct extensive experiments on
four different datasets to evaluate the numeracy ability of NumGPT. The
experiment results show that NumGPT outperforms baseline models (e.g., GPT and
GPT with DICE) on a range of numerical reasoning tasks such as measurement
estimation, number comparison, math word problems, and magnitude
classification. Ablation studies are also conducted to evaluate the impact of
pre-training and model hyperparameters on the performance.
Related papers
- NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning [27.584258258635945]
Language models struggle with handling numerical data and performing arithmetic operations.
We propose a simple adjustment to how numbers are represented by including the count of digits before each number.
By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number.
arXiv Detail & Related papers (2024-03-30T19:46:59Z) - Exploring the Numerical Reasoning Capabilities of Language Models: A
Comprehensive Analysis on Tabular Data [10.124148115680315]
We propose a hierarchical taxonomy for numerical reasoning skills with more than ten reasoning types across four levels.
We conduct a comprehensive evaluation of state-of-the-art models to identify reasoning challenges specific to them.
Our results show that no model consistently excels across all numerical reasoning types.
arXiv Detail & Related papers (2023-11-03T20:05:30Z) - NumHG: A Dataset for Number-Focused Headline Generation [28.57003500212883]
Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text.
We introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation.
We evaluate five well-performing models from previous headline generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability.
arXiv Detail & Related papers (2023-09-04T09:03:53Z) - FERMAT: An Alternative to Accuracy for Numerical Reasoning [11.893004722079557]
numerical reasoning is measured using a single score on existing datasets.
We introduce a multi-view evaluation set for numerical reasoning in English, called FERMAT.
FerMAT evaluates models on various key numerical reasoning aspects such as number understanding, mathematical operations, and training dependency.
arXiv Detail & Related papers (2023-05-27T15:00:45Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained
Language Models [67.48894919842576]
State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require numeracy.
We propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step.
Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy.
arXiv Detail & Related papers (2022-05-13T16:10:13Z) - Impact of Pretraining Term Frequencies on Few-Shot Reasoning [51.990349528930125]
We investigate how well pretrained language models reason with terms that are less frequent in the pretraining data.
We measure the strength of this correlation for a number of GPT-based language models on various numerical deduction tasks.
Although LMs exhibit strong performance at few-shot numerical reasoning tasks, our results raise the question of how much models actually generalize beyond pretraining data.
arXiv Detail & Related papers (2022-02-15T05:43:54Z) - Numerical reasoning in machine reading comprehension tasks: are we there
yet? [79.07883990966077]
Numerical reasoning based machine reading comprehension is a task that involves reading comprehension along with using arithmetic operations such as addition, subtraction, sorting, and counting.
The DROP benchmark is a recent dataset that has inspired the design of NLP models aimed at solving this task.
The current standings of these models in the DROP leaderboard, over standard metrics, suggest that the models have achieved near-human performance.
arXiv Detail & Related papers (2021-09-16T20:13:56Z) - Investigating the Limitations of the Transformers with Simple Arithmetic
Tasks [10.23804850480924]
We find that how a number is represented in its surface form has a strong influence on the model's accuracy.
We conclude that modern pretrained language models can easily learn arithmetic from very few examples.
arXiv Detail & Related papers (2021-02-25T17:22:53Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.