Modeling Preconditions in Text with a Crowd-sourced Dataset
- URL: http://arxiv.org/abs/2010.02429v3
- Date: Wed, 14 Oct 2020 17:56:03 GMT
- Title: Modeling Preconditions in Text with a Crowd-sourced Dataset
- Authors: Heeyoung Kwon, Mahnaz Koupaee, Pratyush Singh, Gargi Sawhney, Anmol
Shukla, Keerthi Kumar Kallur, Nathanael Chambers and Niranjan Balasubramanian
- Abstract summary: This paper introduces PeKo, a crowd-sourced annotation of preconditions between event pairs in newswire.
We also introduce two challenge tasks aimed at modeling preconditions.
Evaluation on both tasks shows that modeling preconditions is challenging even for today's large language models.
- Score: 17.828175478279654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preconditions provide a form of logical connection between events that
explains why some events occur together and information that is complementary
to the more widely studied relations such as causation, temporal ordering,
entailment, and discourse relations. Modeling preconditions in text has been
hampered in part due to the lack of large scale labeled data grounded in text.
This paper introduces PeKo, a crowd-sourced annotation of preconditions between
event pairs in newswire, an order of magnitude larger than prior text
annotations. To complement this new corpus, we also introduce two challenge
tasks aimed at modeling preconditions: (i) Precondition Identification -- a
standard classification task defined over pairs of event mentions, and (ii)
Precondition Generation -- a generative task aimed at testing a more general
ability to reason about a given event. Evaluation on both tasks shows that
modeling preconditions is challenging even for today's large language models
(LM). This suggests that precondition knowledge is not easily accessible in
LM-derived representations alone. Our generation results show that fine-tuning
an LM on PeKo yields better conditional relations than when trained on raw text
or temporally-ordered corpora.
Related papers
- Large Language Models as Event Forecasters [10.32127659470566]
Key elements of human events are extracted as quadruples that consist of subject, relation, object, and timestamp.
These quadruples or quintuples, when organized within a specific domain, form a temporal knowledge graph (TKG)
arXiv Detail & Related papers (2024-06-15T04:09:31Z) - A Generative Approach for Script Event Prediction via Contrastive
Fine-tuning [35.87615178251874]
Script event prediction aims to predict the subsequent event given the context.
Recent works have attempted to improve event correlation reasoning by using pretrained language models and incorporating external knowledge.
We propose a novel generative approach for this task, in which a pretrained language model is fine-tuned with an event-centric pretraining objective.
arXiv Detail & Related papers (2022-12-07T07:32:47Z) - Zero-Shot On-the-Fly Event Schema Induction [61.91468909200566]
We present a new approach in which large language models are utilized to generate source documents that allow predicting, given a high-level event definition, the specific events, arguments, and relations between them.
Using our model, complete schemas on any topic can be generated on-the-fly without any manual data collection, i.e., in a zero-shot manner.
arXiv Detail & Related papers (2022-10-12T14:37:00Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - Improving Commonsense Causal Reasoning by Adversarial Training and Data
Augmentation [14.92157586545743]
This paper presents a number of techniques for making models more robust in the domain of causal reasoning.
We show a statistically significant improvement on performance and on both datasets, even with only a small number of additionally generated data points.
arXiv Detail & Related papers (2021-01-13T09:55:29Z) - Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages.
We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z) - Lexically-constrained Text Generation through Commonsense Knowledge
Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts.
We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z) - Severing the Edge Between Before and After: Neural Architectures for
Temporal Ordering of Events [41.35277143634441]
We propose a neural architecture and a set of training methods for ordering events by predicting temporal relations.
Given that a key challenge with this task is the scarcity of annotated data, our models rely on either pretrained representations or transfer and multi-task learning.
Experiments on the MATRES dataset of English documents establish a new state-of-the-art on this task.
arXiv Detail & Related papers (2020-04-08T23:17:10Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.