Related papers: NT5?! Training T5 to Perform Numerical Reasoning

NT5?! Training T5 to Perform Numerical Reasoning

URL: http://arxiv.org/abs/2104.07307v1
Date: Thu, 15 Apr 2021 08:34:44 GMT
Title: NT5?! Training T5 to Perform Numerical Reasoning
Authors: Peng-Jian Yang, Ying Ting Chen, Yuechan Chen, Daniel Cer
Abstract summary: Numerical reasoning over text (NRoT) presents unique challenges that are not well addressed by existing pre-training objectives. We show that training the T5 multitasking framework with multiple numerical reasoning datasets of increasing difficulty can be achieved without manually engineering partitioned functionality.
Score: 0.8827543048499855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerical reasoning over text (NRoT) presents unique challenges that are not well addressed by existing pre-training objectives. We explore five sequential training schedules that adapt a pre-trained T5 model for NRoT. Our final model is adapted from T5, but further pre-trained on three datasets designed to strengthen skills necessary for NRoT and general reading comprehension before being fine-tuned on the Discrete Reasoning over Text (DROP) dataset. The training improves DROP's adjusted F1 performance (a numeracy-focused score) from 45.90 to 70.83. Our model closes in on GenBERT (72.4), a custom BERT-Base model using the same datasets with significantly more parameters. We show that training the T5 multitasking framework with multiple numerical reasoning datasets of increasing difficulty, good performance on DROP can be achieved without manually engineering partitioned functionality between distributed and symbol modules.

Related papers

Qwen2.5 Technical Report [122.13958993185952]
We introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. Open-weight offerings include base and instruction-tuned models, with quantized versions available. For hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus.
arXiv Detail & Related papers (2024-12-19T17:56:09Z)
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities. TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models. Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z)
Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences. We formulate each task as a sequence-to-sequence problem and perform multi-task training. We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z)
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources [1.9813574408340644]
We present nanoT5, a framework for efficient pre-training and fine-tuning of T5 models. nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. We make our contributions, including configurations, insights, and pre-trained models, available to the public.
arXiv Detail & Related papers (2023-09-05T16:35:41Z)
Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers [98.30298332661323]
This paper explores the effectiveness of model-generated signals in improving zero-shot generalization of text-to-text Transformers such as T5. We develop a new model, METRO-T0, which is pretrained using the redesigned ELECTRA-Style pretraining strategies and then prompt-finetuned on a mixture of NLP tasks. Our analysis on model's neural activation and parameter sensitivity reveals that the effectiveness of METRO-T0 stems from more balanced contribution of parameters and better utilization of their capacity.
arXiv Detail & Related papers (2023-05-21T21:06:23Z)
Instruction Tuned Models are Quick Learners [20.771930945083994]
In this work, we demonstrate the sample efficiency of instruction tuned models over various tasks. In the STL setting, instruction tuned models equipped with 25% of the downstream train data surpass the SOTA performance on the downstream tasks. In the MTL setting, an instruction tuned model trained on only 6% of downstream training data achieve SOTA, while using 100% of the training data results in a 3.69% points improvement.
arXiv Detail & Related papers (2023-05-17T22:30:01Z)
Text Embeddings by Weakly-Supervised Contrastive Pre-training [98.31785569325402]
E5 is a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts.
arXiv Detail & Related papers (2022-12-07T09:25:54Z)
Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups. We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z)
Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model [18.922352061424302]
We investigate the ability of text-to-text transfer learning model (T5) to learn numeracy. We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting. Although T5 models perform reasonably well in the setting, they struggle considerably in the extrapolation setting across all four tasks.
arXiv Detail & Related papers (2021-09-10T05:33:17Z)
mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks. We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z)
Text-to-Text Pre-Training for Data-to-Text Tasks [9.690158790639131]
We study the pre-train + fine-tune strategy for data-to-text tasks. Our experiments indicate that text-to-text pre-training in the form of T5 enables simple, end-to-end transformer based models.
arXiv Detail & Related papers (2020-05-21T02:46:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.