Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained
Language Models
- URL: http://arxiv.org/abs/2205.06733v2
- Date: Fri, 9 Jun 2023 08:04:40 GMT
- Title: Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained
Language Models
- Authors: Dominic Petrak, Nafise Sadat Moosavi, Iryna Gurevych
- Abstract summary: State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require numeracy.
We propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step.
Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy.
- Score: 67.48894919842576
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: State-of-the-art pretrained language models tend to perform below their
capabilities when applied out-of-the-box on tasks that require understanding
and working with numbers. Recent work suggests two main reasons for this: (1)
popular tokenisation algorithms have limited expressiveness for numbers, and
(2) common pretraining objectives do not target numeracy. Approaches that
address these shortcomings usually require architectural changes or pretraining
from scratch. In this paper, we propose a new extended pretraining approach
called Arithmetic-Based Pretraining that jointly addresses both in one extended
pretraining step without requiring architectural changes or pretraining from
scratch. Arithmetic-Based Pretraining combines contrastive learning to improve
the number representation, and a novel extended pretraining objective called
Inferable Number Prediction Task to improve numeracy. Our experiments show the
effectiveness of Arithmetic-Based Pretraining in three different tasks that
require improved numeracy, i.e., reading comprehension in the DROP dataset,
inference-on-tables in the InfoTabs dataset, and table-to-text generation in
the WikiBio and SciGen datasets.
Related papers
- Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency.
This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models [29.367678364485794]
We show how to design efficacious data distributions and learning rate schedules for continued pretraining of language models.
We show an improvement of 9% in average model accuracy compared to the baseline of continued training on the pretraining set.
arXiv Detail & Related papers (2024-07-09T22:37:59Z) - Unified Pretraining for Recommendation via Task Hypergraphs [55.98773629788986]
We propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs.
For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction.
A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation.
arXiv Detail & Related papers (2023-10-20T05:33:21Z) - Teaching Arithmetic to Small Transformers [39.72665384986095]
This study investigates how small transformers can efficiently learn arithmetic operations.
We first demonstrate that conventional training data is not the most effective for arithmetic learning.
We then train on chain-of-thought style data that includes intermediate step results.
arXiv Detail & Related papers (2023-07-07T04:33:31Z) - SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table
Semantic Parsing [19.779493883522072]
This paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.
We propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner.
arXiv Detail & Related papers (2021-11-18T02:51:04Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.