Turning Tables: Generating Examples from Semi-structured Tables for
Endowing Language Models with Reasoning Skills
- URL: http://arxiv.org/abs/2107.07261v1
- Date: Thu, 15 Jul 2021 11:37:14 GMT
- Title: Turning Tables: Generating Examples from Semi-structured Tables for
Endowing Language Models with Reasoning Skills
- Authors: Ori Yoran, Alon Talmor, Jonathan Berant
- Abstract summary: We propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs.
We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills.
We show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model.
- Score: 32.55545292360155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Models pre-trained with a language modeling objective possess ample world
knowledge and language skills, but are known to struggle in tasks that require
reasoning. In this work, we propose to leverage semi-structured tables, and
automatically generate at scale question-paragraph pairs, where answering the
question requires reasoning over multiple facts in the paragraph. We add a
pre-training step over this synthetic data, which includes examples that
require 16 different reasoning skills such as number comparison, conjunction,
and fact composition. To improve data efficiency, we propose sampling
strategies that focus training on reasoning skills the model is currently
lacking. We evaluate our approach on three reading comprehension datasets that
are focused on reasoning, and show that our model, PReasM, substantially
outperforms T5, a popular pre-trained encoder-decoder model. Moreover, sampling
examples based on current model errors leads to faster training and higher
overall performance.
Related papers
- Ensembling Finetuned Language Models for Text Classification [55.15643209328513]
Finetuning is a common practice across different communities to adapt pretrained models to particular tasks.
ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates.
We present a metadataset with predictions from five large finetuned models on six datasets and report results of different ensembling strategies.
arXiv Detail & Related papers (2024-10-25T09:15:54Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Teaching Broad Reasoning Skills via Decomposition-Guided Contexts [50.114651561111245]
Question-answering datasets require a broad set of reasoning skills.
We show how to use question decompositions to teach these broad reasoning skills in a robust fashion.
arXiv Detail & Related papers (2022-05-25T05:13:21Z) - How much pretraining data do language models need to learn syntax? [12.668478784932878]
Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks.
We study the impact of pretraining data size on the knowledge of the models using RoBERTa.
arXiv Detail & Related papers (2021-09-07T15:51:39Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - When Can Models Learn From Explanations? A Formal Framework for
Understanding the Roles of Explanation Data [84.87772675171412]
We study the circumstances under which explanations of individual data points can improve modeling performance.
We make use of three existing datasets with explanations: e-SNLI, TACRED, SemEval.
arXiv Detail & Related papers (2021-02-03T18:57:08Z) - Improving Commonsense Causal Reasoning by Adversarial Training and Data
Augmentation [14.92157586545743]
This paper presents a number of techniques for making models more robust in the domain of causal reasoning.
We show a statistically significant improvement on performance and on both datasets, even with only a small number of additionally generated data points.
arXiv Detail & Related papers (2021-01-13T09:55:29Z) - Injecting Numerical Reasoning Skills into Language Models [41.78745615537762]
High-level reasoning skills, such as numerical reasoning, are difficult to learn from a language-modeling objective only.
We show that numerical reasoning is amenable to automatic data generation, and thus one can inject this skill into pre-trained LMs.
We show that our model, GenBERT, dramatically improves performance on DROP (49.3 $rightarrow$ 72.3 F1), reaching performance that matches state-of-the-art models of comparable size.
arXiv Detail & Related papers (2020-04-09T11:14:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.