Injecting Numerical Reasoning Skills into Language Models
- URL: http://arxiv.org/abs/2004.04487v1
- Date: Thu, 9 Apr 2020 11:14:56 GMT
- Title: Injecting Numerical Reasoning Skills into Language Models
- Authors: Mor Geva, Ankit Gupta, Jonathan Berant
- Abstract summary: High-level reasoning skills, such as numerical reasoning, are difficult to learn from a language-modeling objective only.
We show that numerical reasoning is amenable to automatic data generation, and thus one can inject this skill into pre-trained LMs.
We show that our model, GenBERT, dramatically improves performance on DROP (49.3 $rightarrow$ 72.3 F1), reaching performance that matches state-of-the-art models of comparable size.
- Score: 41.78745615537762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained language models (LMs) are known to encode substantial
amounts of linguistic information. However, high-level reasoning skills, such
as numerical reasoning, are difficult to learn from a language-modeling
objective only. Consequently, existing models for numerical reasoning have used
specialized architectures with limited flexibility. In this work, we show that
numerical reasoning is amenable to automatic data generation, and thus one can
inject this skill into pre-trained LMs, by generating large amounts of data,
and training in a multi-task setup. We show that pre-training our model,
GenBERT, on this data, dramatically improves performance on DROP (49.3
$\rightarrow$ 72.3 F1), reaching performance that matches state-of-the-art
models of comparable size, while using a simple and general-purpose
encoder-decoder architecture. Moreover, GenBERT generalizes well to math word
problem datasets, while maintaining high performance on standard RC tasks. Our
approach provides a general recipe for injecting skills into large pre-trained
LMs, whenever the skill is amenable to automatic data augmentation.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models [12.424072830053445]
We present a model merging methodology that addresses the difficulty of fine-tuning Large Language Models (LLMs) for target tasks in non-English languages.
We fine-tune separate "experts" on math instruction data in English and on generic instruction data in the target language.
We replace the top and bottom transformer layers of the math expert directly with layers from the language expert, which consequently enhances math performance in the target language.
arXiv Detail & Related papers (2024-10-02T08:53:07Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - Turning Tables: Generating Examples from Semi-structured Tables for
Endowing Language Models with Reasoning Skills [32.55545292360155]
We propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs.
We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills.
We show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model.
arXiv Detail & Related papers (2021-07-15T11:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.