NT5?! Training T5 to Perform Numerical Reasoning
- URL: http://arxiv.org/abs/2104.07307v1
- Date: Thu, 15 Apr 2021 08:34:44 GMT
- Title: NT5?! Training T5 to Perform Numerical Reasoning
- Authors: Peng-Jian Yang, Ying Ting Chen, Yuechan Chen, Daniel Cer
- Abstract summary: Numerical reasoning over text (NRoT) presents unique challenges that are not well addressed by existing pre-training objectives.
We show that training the T5 multitasking framework with multiple numerical reasoning datasets of increasing difficulty can be achieved without manually engineering partitioned functionality.
- Score: 0.8827543048499855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerical reasoning over text (NRoT) presents unique challenges that are not
well addressed by existing pre-training objectives. We explore five sequential
training schedules that adapt a pre-trained T5 model for NRoT. Our final model
is adapted from T5, but further pre-trained on three datasets designed to
strengthen skills necessary for NRoT and general reading comprehension before
being fine-tuned on the Discrete Reasoning over Text (DROP) dataset. The
training improves DROP's adjusted F1 performance (a numeracy-focused score)
from 45.90 to 70.83. Our model closes in on GenBERT (72.4), a custom BERT-Base
model using the same datasets with significantly more parameters. We show that
training the T5 multitasking framework with multiple numerical reasoning
datasets of increasing difficulty, good performance on DROP can be achieved
without manually engineering partitioned functionality between distributed and
symbol modules.
Related papers
- Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style
Models with Limited Resources [1.9813574408340644]
We present nanoT5, a framework for efficient pre-training and fine-tuning of T5 models.
nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance.
We make our contributions, including configurations, insights, and pre-trained models, available to the public.
arXiv Detail & Related papers (2023-09-05T16:35:41Z) - Model-Generated Pretraining Signals Improves Zero-Shot Generalization of
Text-to-Text Transformers [98.30298332661323]
This paper explores the effectiveness of model-generated signals in improving zero-shot generalization of text-to-text Transformers such as T5.
We develop a new model, METRO-T0, which is pretrained using the redesigned ELECTRA-Style pretraining strategies and then prompt-finetuned on a mixture of NLP tasks.
Our analysis on model's neural activation and parameter sensitivity reveals that the effectiveness of METRO-T0 stems from more balanced contribution of parameters and better utilization of their capacity.
arXiv Detail & Related papers (2023-05-21T21:06:23Z) - Instruction Tuned Models are Quick Learners [20.771930945083994]
In this work, we demonstrate the sample efficiency of instruction tuned models over various tasks.
In the STL setting, instruction tuned models equipped with 25% of the downstream train data surpass the SOTA performance on the downstream tasks.
In the MTL setting, an instruction tuned model trained on only 6% of downstream training data achieve SOTA, while using 100% of the training data results in a 3.69% points improvement.
arXiv Detail & Related papers (2023-05-17T22:30:01Z) - The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning [118.70716915295091]
We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022.
We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning.
To accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available.
arXiv Detail & Related papers (2023-01-31T15:03:44Z) - Text Embeddings by Weakly-Supervised Contrastive Pre-training [98.31785569325402]
E5 is a family of state-of-the-art text embeddings that transfer well to a wide range of tasks.
E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts.
arXiv Detail & Related papers (2022-12-07T09:25:54Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model [18.922352061424302]
We investigate the ability of text-to-text transfer learning model (T5) to learn numeracy.
We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting.
Although T5 models perform reasonably well in the setting, they struggle considerably in the extrapolation setting across all four tasks.
arXiv Detail & Related papers (2021-09-10T05:33:17Z) - mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks.
We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z) - Text-to-Text Pre-Training for Data-to-Text Tasks [9.690158790639131]
We study the pre-train + fine-tune strategy for data-to-text tasks.
Our experiments indicate that text-to-text pre-training in the form of T5 enables simple, end-to-end transformer based models.
arXiv Detail & Related papers (2020-05-21T02:46:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.