Related papers: Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model

Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model

URL: http://arxiv.org/abs/2109.04672v1
Date: Fri, 10 Sep 2021 05:33:17 GMT
Title: Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model
Authors: Kuntal Kumar Pal and Chitta Baral
Abstract summary: We investigate the ability of text-to-text transfer learning model (T5) to learn numeracy. We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting. Although T5 models perform reasonably well in the setting, they struggle considerably in the extrapolation setting across all four tasks.
Score: 18.922352061424302
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The transformer-based pre-trained language models have been tremendously successful in most of the conventional NLP tasks. But they often struggle in those tasks where numerical understanding is required. Some possible reasons can be the tokenizers and pre-training objectives which are not specifically designed to learn and preserve numeracy. Here we investigate the ability of text-to-text transfer learning model (T5), which has outperformed its predecessors in the conventional NLP tasks, to learn numeracy. We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting. We find that, although T5 models perform reasonably well in the interpolation setting, they struggle considerably in the extrapolation setting across all four tasks.

Related papers

Number Cookbook: Number Understanding of Language Models and How to Improve It [63.9542740221096]
Large language models (LLMs) can solve an increasing number of complex reasoning tasks while making surprising mistakes in basic numerical understanding and processing. This paper comprehensively investigates the numerical understanding and processing ability (NUPA) of LLMs.
arXiv Detail & Related papers (2024-11-06T08:59:44Z)
Limits of Transformer Language Models on Learning to Compose Algorithms [77.2443883991608]
We evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks. Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient.
arXiv Detail & Related papers (2024-02-08T16:23:29Z)
Improving Cross-task Generalization of Unified Table-to-text Models with Compositional Task Configurations [63.04466647849211]
Methods typically encode task information with a simple dataset name as a prefix to the encoder. We propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization. We show this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations.
arXiv Detail & Related papers (2022-12-17T02:20:14Z)
Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks. Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z)
Plex: Towards Reliability using Pretrained Large Model Extensions [69.13326436826227]
We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
arXiv Detail & Related papers (2022-07-15T11:39:37Z)
Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained Language Models [67.48894919842576]
State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require numeracy. We propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step. Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy.
arXiv Detail & Related papers (2022-05-13T16:10:13Z)
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification. Our key idea is to represent each task using gradient information from a base model. Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z)
LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5 [3.04585143845864]
We propose a unified framework for Lifelong Few-shot Language Learning (LFLL) based on prompt tuning of T5. Our framework called LFPT5 takes full advantage of PT's strong few-shot learning ability, and simultaneously trains the model as a task solver and a data generator. With extensive experiments, we demonstrate that LFPT5 can be applied to various different types of tasks and significantly outperform previous methods in different LFLL settings.
arXiv Detail & Related papers (2021-10-14T12:06:29Z)
NT5?! Training T5 to Perform Numerical Reasoning [0.8827543048499855]
Numerical reasoning over text (NRoT) presents unique challenges that are not well addressed by existing pre-training objectives. We show that training the T5 multitasking framework with multiple numerical reasoning datasets of increasing difficulty can be achieved without manually engineering partitioned functionality.
arXiv Detail & Related papers (2021-04-15T08:34:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.