WANLI: Worker and AI Collaboration for Natural Language Inference
Dataset Creation
- URL: http://arxiv.org/abs/2201.05955v1
- Date: Sun, 16 Jan 2022 03:13:49 GMT
- Title: WANLI: Worker and AI Collaboration for Natural Language Inference
Dataset Creation
- Authors: Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
- Abstract summary: We introduce a novel paradigm for dataset creation based on human and machine collaboration.
We use dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instruct GPT-3 to compose new examples with similar patterns.
The resulting dataset, WANLI, consists of 108,357 natural language inference (NLI) examples that present unique empirical strengths.
- Score: 101.00109827301235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A recurring challenge of crowdsourcing NLP datasets at scale is that human
writers often rely on repetitive patterns when crafting examples, leading to a
lack of linguistic diversity. We introduce a novel paradigm for dataset
creation based on human and machine collaboration, which brings together the
generative strength of language models and the evaluative strength of humans.
Starting with an existing dataset, MultiNLI, our approach uses dataset
cartography to automatically identify examples that demonstrate challenging
reasoning patterns, and instructs GPT-3 to compose new examples with similar
patterns. Machine generated examples are then automatically filtered, and
finally revised and labeled by human crowdworkers to ensure quality. The
resulting dataset, WANLI, consists of 108,357 natural language inference (NLI)
examples that present unique empirical strengths over existing NLI datasets.
Remarkably, training a model on WANLI instead of MNLI (which is 4 times larger)
improves performance on seven out-of-domain test sets we consider, including by
11% on HANS and 9% on Adversarial NLI. Moreover, combining MNLI with WANLI is
more effective than combining with other augmentation sets that have been
introduced. Our results demonstrate the potential of natural language
generation techniques to curate NLP datasets of enhanced quality and diversity.
Related papers
- How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - ViANLI: Adversarial Natural Language Inference for Vietnamese [1.907126872483548]
We introduce the adversarial NLI dataset to the NLP research community with the name ViANLI.
This data set contains more than 10K premise-hypothesis pairs.
The accuracy of the most powerful model on the test set only reached 48.4%.
arXiv Detail & Related papers (2024-06-25T16:58:19Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - A deep Natural Language Inference predictor without language-specific
training data [44.26507854087991]
We present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset.
We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model.
The model has been evaluated over machine translated Stanford NLI test dataset, machine translated Multi-Genre NLI test dataset, and manually translated RTE3-ITA test dataset.
arXiv Detail & Related papers (2023-09-06T10:20:59Z) - Improving Domain-Specific Retrieval by NLI Fine-Tuning [64.79760042717822]
This article investigates the fine-tuning potential of natural language inference (NLI) data to improve information retrieval and ranking.
We employ both monolingual and multilingual sentence encoders fine-tuned by a supervised method utilizing contrastive loss and NLI data.
Our results point to the fact that NLI fine-tuning increases the performance of the models in both tasks and both languages, with the potential to improve mono- and multilingual models.
arXiv Detail & Related papers (2023-08-06T12:40:58Z) - mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets.
Despite promising results, current models still suffer from generating factually inconsistent summaries.
We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z) - Multi-Scales Data Augmentation Approach In Natural Language Inference
For Artifacts Mitigation And Pre-Trained Model Optimization [0.0]
We provide a variety of techniques for analyzing and locating dataset artifacts inside the crowdsourced Stanford Natural Language Inference corpus.
To mitigate dataset artifacts, we employ a unique multi-scale data augmentation technique with two distinct frameworks.
Our combination method enhances our model's resistance to perturbation testing, enabling it to continuously outperform the pre-trained baseline.
arXiv Detail & Related papers (2022-12-16T23:37:44Z) - Polish Natural Language Inference and Factivity -- an Expert-based
Dataset and Benchmarks [0.07734726150561087]
The dataset contains entirely natural language utterances in Polish.
It is a representative sample in regards to frequency of main verbs and other linguistic features.
BERT-based models consuming only the input sentences show that they capture most of the complexity of NLI/factivity.
arXiv Detail & Related papers (2022-01-10T18:32:55Z) - OCNLI: Original Chinese Natural Language Inference [21.540733910984006]
We present the first large-scale NLI dataset (consisting of 56,000 annotated sentence pairs) for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI)
Unlike recent attempts at extending NLI to other languages, our dataset does not rely on any automatic translation or non-expert annotation.
We establish several baseline results on our dataset using state-of-the-art pre-trained models for Chinese, and find even the best performing models to be far outpaced by human performance.
arXiv Detail & Related papers (2020-10-12T04:25:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.