Models in the Loop: Aiding Crowdworkers with Generative Annotation
Assistants
- URL: http://arxiv.org/abs/2112.09062v1
- Date: Thu, 16 Dec 2021 17:59:39 GMT
- Title: Models in the Loop: Aiding Crowdworkers with Generative Annotation
Assistants
- Authors: Max Bartolo, Tristan Thrush, Sebastian Riedel, Pontus Stenetorp, Robin
Jia, Douwe Kiela
- Abstract summary: We introduce Generative Assistants (GAAs) that provide real-time suggestions that annotators can approve, modify, or reject entirely.
GAAs provide significant efficiency benefits in terms of annotation speed, while leading to improved model fooling rates.
- Score: 41.9785159975426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Dynamic Adversarial Data Collection (DADC), human annotators are tasked
with finding examples that models struggle to predict correctly. Models trained
on DADC-collected training data have been shown to be more robust in
adversarial and out-of-domain settings, and are considerably harder for humans
to fool. However, DADC is more time-consuming than traditional data collection
and thus more costly per example. In this work, we examine if we can maintain
the advantages of DADC, without suffering the additional cost. To that end, we
introduce Generative Annotation Assistants (GAAs), generator-in-the-loop models
that provide real-time suggestions that annotators can either approve, modify,
or reject entirely. We collect training datasets in twenty experimental
settings and perform a detailed analysis of this approach for the task of
extractive question answering (QA) for both standard and adversarial data
collection. We demonstrate that GAAs provide significant efficiency benefits in
terms of annotation speed, while leading to improved model fooling rates. In
addition, we show that GAA-assisted data leads to higher downstream model
performance on a variety of question answering tasks.
Related papers
- Adding Conditional Control to Diffusion Models with Reinforcement Learning [59.295203871547336]
Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples.
This work presents a novel method based on reinforcement learning (RL) to add additional controls, leveraging an offline dataset.
arXiv Detail & Related papers (2024-06-17T22:00:26Z) - Improving a Named Entity Recognizer Trained on Noisy Data with a Few
Clean Instances [55.37242480995541]
We propose to denoise noisy NER data with guidance from a small set of clean instances.
Along with the main NER model we train a discriminator model and use its outputs to recalibrate the sample weights.
Results on public crowdsourcing and distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.
arXiv Detail & Related papers (2023-10-25T17:23:37Z) - Analyzing Dynamic Adversarial Training Data in the Limit [50.00850852546616]
Dynamic adversarial data collection (DADC) holds promise as an approach for generating such diverse training sets.
We present the first study of longer-term DADC, where we collect 20 rounds of NLI examples for a small set of premise paragraphs.
Models trained on DADC examples make 26% fewer errors on our expert-curated test set compared to models trained on non-adversarial data.
arXiv Detail & Related papers (2021-10-16T08:48:52Z) - Exploring the Efficacy of Automatically Generated Counterfactuals for
Sentiment Analysis [17.811597734603144]
We propose an approach to automatically generating counterfactual data for data augmentation and explanation.
A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance.
arXiv Detail & Related papers (2021-06-29T10:27:01Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.