Injecting Numerical Reasoning Skills into Language Models
- URL: http://arxiv.org/abs/2004.04487v1
- Date: Thu, 9 Apr 2020 11:14:56 GMT
- Title: Injecting Numerical Reasoning Skills into Language Models
- Authors: Mor Geva, Ankit Gupta, Jonathan Berant
- Abstract summary: High-level reasoning skills, such as numerical reasoning, are difficult to learn from a language-modeling objective only.
We show that numerical reasoning is amenable to automatic data generation, and thus one can inject this skill into pre-trained LMs.
We show that our model, GenBERT, dramatically improves performance on DROP (49.3 $rightarrow$ 72.3 F1), reaching performance that matches state-of-the-art models of comparable size.
- Score: 41.78745615537762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained language models (LMs) are known to encode substantial
amounts of linguistic information. However, high-level reasoning skills, such
as numerical reasoning, are difficult to learn from a language-modeling
objective only. Consequently, existing models for numerical reasoning have used
specialized architectures with limited flexibility. In this work, we show that
numerical reasoning is amenable to automatic data generation, and thus one can
inject this skill into pre-trained LMs, by generating large amounts of data,
and training in a multi-task setup. We show that pre-training our model,
GenBERT, on this data, dramatically improves performance on DROP (49.3
$\rightarrow$ 72.3 F1), reaching performance that matches state-of-the-art
models of comparable size, while using a simple and general-purpose
encoder-decoder architecture. Moreover, GenBERT generalizes well to math word
problem datasets, while maintaining high performance on standard RC tasks. Our
approach provides a general recipe for injecting skills into large pre-trained
LMs, whenever the skill is amenable to automatic data augmentation.
Related papers
- DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries [0.0]
We evaluate OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS)
The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards.
arXiv Detail & Related papers (2024-03-29T22:59:34Z) - The Truth is in There: Improving Reasoning in Language Models with
Layer-Selective Rank Reduction [22.659005954676598]
We show that it is possible to significantly improve the performance of Large Language Models by selectively removing higher-order components of their weight matrices.
This simple intervention, which we call LAyer-SElective Rank reduction (LASER), can be done on a model after training has completed.
We show extensive experiments demonstrating the generality of this finding across language models and datasets.
arXiv Detail & Related papers (2023-12-21T03:51:08Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - Turning Tables: Generating Examples from Semi-structured Tables for
Endowing Language Models with Reasoning Skills [32.55545292360155]
We propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs.
We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills.
We show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model.
arXiv Detail & Related papers (2021-07-15T11:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.