Improving Question Answering Model Robustness with Synthetic Adversarial
Data Generation
- URL: http://arxiv.org/abs/2104.08678v1
- Date: Sun, 18 Apr 2021 02:00:06 GMT
- Title: Improving Question Answering Model Robustness with Synthetic Adversarial
Data Generation
- Authors: Max Bartolo, Tristan Thrush, Robin Jia, Sebastian Riedel, Pontus
Stenetorp, Douwe Kiela
- Abstract summary: State-of-the-art question answering models remain susceptible to a variety of adversarial attacks and are still far from obtaining human-level language understanding.
One proposed way forward is dynamic adversarial data collection, in which a human annotator attempts to create examples for which a model-in-the-loop fails.
In this work, we investigate several answer selection, question generation, and filtering methods that form a synthetic adversarial data generation pipeline.
Models trained on both synthetic and human-generated data outperform models not trained on synthetic adversarial data, and obtain state-of-the-art results on the Adversarial
- Score: 41.9785159975426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the availability of very large datasets and pretrained models,
state-of-the-art question answering models remain susceptible to a variety of
adversarial attacks and are still far from obtaining human-level language
understanding. One proposed way forward is dynamic adversarial data collection,
in which a human annotator attempts to create examples for which a
model-in-the-loop fails. However, this approach comes at a higher cost per
sample and slower pace of annotation, as model-adversarial data requires more
annotator effort to generate. In this work, we investigate several answer
selection, question generation, and filtering methods that form a synthetic
adversarial data generation pipeline that takes human-generated adversarial
samples and unannotated text to create synthetic question-answer pairs. Models
trained on both synthetic and human-generated data outperform models not
trained on synthetic adversarial data, and obtain state-of-the-art results on
the AdversarialQA dataset with overall performance gains of 3.7F1. Furthermore,
we find that training on the synthetic adversarial data improves model
generalisation across domains for non-adversarial data, demonstrating gains on
9 of the 12 datasets for MRQA. Lastly, we find that our models become
considerably more difficult to beat by human adversaries, with a drop in
macro-averaged validated model error rate from 17.6% to 8.8% when compared to
non-augmented models.
Related papers
- Chatting Up Attachment: Using LLMs to Predict Adult Bonds [0.0]
We use GPT-4 and Claude 3 Opus to create agents that simulate adults with varying profiles, childhood memories, and attachment styles.
We evaluate our models using a transcript dataset from 9 humans who underwent the same interview protocol, analyzed and labeled by mental health professionals.
Our findings indicate that training the models using only synthetic data achieves performance comparable to training the models on human data.
arXiv Detail & Related papers (2024-08-31T04:29:19Z) - Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences [20.629333587044012]
We study the impact of data curation on iterated retraining of generative models.
We prove that, if the data is curated according to a reward model, the expected reward of the iterative retraining procedure is maximized.
arXiv Detail & Related papers (2024-06-12T21:28:28Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
Language Models by Extrapolating Errors from Small Models [69.76066070227452]
*Data Synthesis* is a promising way to train a small model with very little labeled data.
We propose *Synthesis Step by Step* (**S3**), a data synthesis framework that shrinks this distribution gap.
Our approach improves the performance of a small model by reducing the gap between the synthetic dataset and the real data.
arXiv Detail & Related papers (2023-10-20T17:14:25Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Training Question Answering Models From Synthetic Data [26.91650323300262]
This work aims to narrow the gap between synthetic and human-generated question-answer pairs.
We synthesize questions and answers from a synthetic corpus generated by an 8.3 billion parameter GPT-2 model.
With no access to human supervision and only access to other models, we are able to train state of the art question answering networks on entirely model-generated data.
arXiv Detail & Related papers (2020-02-22T01:49:27Z) - Beat the AI: Investigating Adversarial Human Annotation for Reading
Comprehension [27.538957000237176]
Humans create questions adversarially, such that the model fails to answer them correctly.
We collect 36,000 samples with progressively stronger models in the annotation loop.
We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets.
We find that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop.
arXiv Detail & Related papers (2020-02-02T00:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.