Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model
- URL: http://arxiv.org/abs/2109.04672v1
- Date: Fri, 10 Sep 2021 05:33:17 GMT
- Title: Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model
- Authors: Kuntal Kumar Pal and Chitta Baral
- Abstract summary: We investigate the ability of text-to-text transfer learning model (T5) to learn numeracy.
We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting.
Although T5 models perform reasonably well in the setting, they struggle considerably in the extrapolation setting across all four tasks.
- Score: 18.922352061424302
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The transformer-based pre-trained language models have been tremendously
successful in most of the conventional NLP tasks. But they often struggle in
those tasks where numerical understanding is required. Some possible reasons
can be the tokenizers and pre-training objectives which are not specifically
designed to learn and preserve numeracy. Here we investigate the ability of
text-to-text transfer learning model (T5), which has outperformed its
predecessors in the conventional NLP tasks, to learn numeracy. We consider four
numeracy tasks: numeration, magnitude order prediction, finding minimum and
maximum in a series, and sorting. We find that, although T5 models perform
reasonably well in the interpolation setting, they struggle considerably in the
extrapolation setting across all four tasks.
Related papers
- Number Cookbook: Number Understanding of Language Models and How to Improve It [63.9542740221096]
Large language models (LLMs) can solve an increasing number of complex reasoning tasks while making surprising mistakes in basic numerical understanding and processing.
This paper comprehensively investigates the numerical understanding and processing ability (NUPA) of LLMs.
arXiv Detail & Related papers (2024-11-06T08:59:44Z) - Limits of Transformer Language Models on Learning to Compose Algorithms [77.2443883991608]
We evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks.
Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient.
arXiv Detail & Related papers (2024-02-08T16:23:29Z) - Improving Cross-task Generalization of Unified Table-to-text Models with
Compositional Task Configurations [63.04466647849211]
Methods typically encode task information with a simple dataset name as a prefix to the encoder.
We propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization.
We show this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations.
arXiv Detail & Related papers (2022-12-17T02:20:14Z) - Effective Cross-Task Transfer Learning for Explainable Natural Language
Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks.
Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z) - Plex: Towards Reliability using Pretrained Large Model Extensions [69.13326436826227]
We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively.
Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol.
We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
arXiv Detail & Related papers (2022-07-15T11:39:37Z) - Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained
Language Models [67.48894919842576]
State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require numeracy.
We propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step.
Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy.
arXiv Detail & Related papers (2022-05-13T16:10:13Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based
on Prompt Tuning of T5 [3.04585143845864]
We propose a unified framework for Lifelong Few-shot Language Learning (LFLL) based on prompt tuning of T5.
Our framework called LFPT5 takes full advantage of PT's strong few-shot learning ability, and simultaneously trains the model as a task solver and a data generator.
With extensive experiments, we demonstrate that LFPT5 can be applied to various different types of tasks and significantly outperform previous methods in different LFLL settings.
arXiv Detail & Related papers (2021-10-14T12:06:29Z) - NT5?! Training T5 to Perform Numerical Reasoning [0.8827543048499855]
Numerical reasoning over text (NRoT) presents unique challenges that are not well addressed by existing pre-training objectives.
We show that training the T5 multitasking framework with multiple numerical reasoning datasets of increasing difficulty can be achieved without manually engineering partitioned functionality.
arXiv Detail & Related papers (2021-04-15T08:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.