Improving Commonsense Causal Reasoning by Adversarial Training and Data
Augmentation
- URL: http://arxiv.org/abs/2101.04966v1
- Date: Wed, 13 Jan 2021 09:55:29 GMT
- Title: Improving Commonsense Causal Reasoning by Adversarial Training and Data
Augmentation
- Authors: Ieva Stali\=unait\.e, Philip John Gorinski, Ignacio Iacobacci
- Abstract summary: This paper presents a number of techniques for making models more robust in the domain of causal reasoning.
We show a statistically significant improvement on performance and on both datasets, even with only a small number of additionally generated data points.
- Score: 14.92157586545743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Determining the plausibility of causal relations between clauses is a
commonsense reasoning task that requires complex inference ability. The general
approach to this task is to train a large pretrained language model on a
specific dataset. However, the available training data for the task is often
scarce, which leads to instability of model training or reliance on the shallow
features of the dataset. This paper presents a number of techniques for making
models more robust in the domain of causal reasoning. Firstly, we perform
adversarial training by generating perturbed inputs through synonym
substitution. Secondly, based on a linguistic theory of discourse connectives,
we perform data augmentation using a discourse parser for detecting causally
linked clauses in large text, and a generative language model for generating
distractors. Both methods boost model performance on the Choice of Plausible
Alternatives (COPA) dataset, as well as on a Balanced COPA dataset, which is a
modified version of the original data that has been developed to avoid
superficial cues, leading to a more challenging benchmark. We show a
statistically significant improvement in performance and robustness on both
datasets, even with only a small number of additionally generated data points.
Related papers
- Dissecting vocabulary biases datasets through statistical testing and
automated data augmentation for artifact mitigation in Natural Language
Inference [3.154631846975021]
We focus on investigating dataset artifacts and developing strategies to address these issues.
We propose several automatic data augmentation strategies spanning character to word levels.
Experiments demonstrate that the proposed approaches effectively enhance model accuracy and reduce biases by up to 0.66% and 1.14%, respectively.
arXiv Detail & Related papers (2023-12-14T08:46:26Z) - Controlled Randomness Improves the Performance of Transformer Models [4.678970068275123]
We introduce controlled randomness, i.e. noise, into the training process to improve fine-tuning language models.
We find that adding such noise can improve the performance in our two downstream tasks of joint named entity recognition and relation extraction and text summarization.
arXiv Detail & Related papers (2023-10-20T14:12:55Z) - Boosting Event Extraction with Denoised Structure-to-Text Augmentation [52.21703002404442]
Event extraction aims to recognize pre-defined event triggers and arguments from texts.
Recent data augmentation methods often neglect the problem of grammatical incorrectness.
We propose a denoised structure-to-text augmentation framework for event extraction DAEE.
arXiv Detail & Related papers (2023-05-16T16:52:07Z) - Multi-Scales Data Augmentation Approach In Natural Language Inference
For Artifacts Mitigation And Pre-Trained Model Optimization [0.0]
We provide a variety of techniques for analyzing and locating dataset artifacts inside the crowdsourced Stanford Natural Language Inference corpus.
To mitigate dataset artifacts, we employ a unique multi-scale data augmentation technique with two distinct frameworks.
Our combination method enhances our model's resistance to perturbation testing, enabling it to continuously outperform the pre-trained baseline.
arXiv Detail & Related papers (2022-12-16T23:37:44Z) - CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation [91.16551253297588]
COunterfactual Generation via Retrieval and Editing (CORE) is a retrieval-augmented generation framework for creating diverse counterfactual perturbations for training.
CORE first performs a dense retrieval over a task-related unlabeled text corpus using a learned bi-encoder.
CORE then incorporates these into prompts to a large language model with few-shot learning capabilities, for counterfactual editing.
arXiv Detail & Related papers (2022-10-10T17:45:38Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - How Does Data Corruption Affect Natural Language Understanding Models? A
Study on GLUE datasets [4.645287693363387]
We show that performance remains high for most GLUE tasks when the models are fine-tuned or tested on corrupted data.
Our proposed data transformations can be used as a diagnostic tool for assessing the extent to which a specific dataset constitutes a proper testbed for evaluating models' language understanding capabilities.
arXiv Detail & Related papers (2022-01-12T13:35:53Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.