ANLIzing the Adversarial Natural Language Inference Dataset
- URL: http://arxiv.org/abs/2010.12729v1
- Date: Sat, 24 Oct 2020 01:03:51 GMT
- Title: ANLIzing the Adversarial Natural Language Inference Dataset
- Authors: Adina Williams, Tristan Thrush, Douwe Kiela
- Abstract summary: We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset.
We propose a fine-grained annotation scheme of the different aspects of inference that are responsible for the gold classification labels, and use it to hand-code all three of the ANLI development sets.
- Score: 46.7480191735164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently
introduced large-scale human-and-model-in-the-loop natural language inference
dataset collected over multiple rounds. We propose a fine-grained annotation
scheme of the different aspects of inference that are responsible for the gold
classification labels, and use it to hand-code all three of the ANLI
development sets. We use these annotations to answer a variety of interesting
questions: which inference types are most common, which models have the highest
performance on each reasoning type, and which types are the most challenging
for state of-the-art models? We hope that our annotations will enable more
fine-grained evaluation of models trained on ANLI, provide us with a deeper
understanding of where models fail and succeed, and help us determine how to
train better models in future.
Related papers
- OLaLa: Ontology Matching with Large Language Models [2.211868306499727]
Ontology Matching is a challenging task where information in natural language is one of the most important signals to process.
With the rise of Large Language Models, it is possible to incorporate this knowledge in a better way into the matching pipeline.
We show that with only a handful of examples and a well-designed prompt, it is possible to achieve results that are en par with supervised matching systems.
arXiv Detail & Related papers (2023-11-07T09:34:20Z) - POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained
models [62.23255433487586]
We propose an unsupervised fine-tuning framework to fine-tune the model or prompt on the unlabeled target data.
We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data.
arXiv Detail & Related papers (2023-04-29T22:05:22Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Internet-augmented language models through few-shot prompting for
open-domain question answering [6.573232954655063]
We capitalize on the unique few-shot capabilities offered by large-scale language models to overcome some of their challenges.
We use few-shot prompting to learn to condition language models on information returned from the web using Google Search.
We find that language models conditioned on the web surpass performance of closed-book models of similar, or even larger, model sizes in open-domain question answering.
arXiv Detail & Related papers (2022-03-10T02:24:14Z) - Scaling Language Models: Methods, Analysis & Insights from Training
Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales.
Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language.
We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z) - Unsupervised Pre-training with Structured Knowledge for Improving
Natural Language Inference [22.648536283569747]
We propose models that leverage structured knowledge in different components of pre-trained models.
Our results show that the proposed models perform better than previous BERT-based state-of-the-art models.
arXiv Detail & Related papers (2021-09-08T21:28:12Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - Natural Language Inference with a Human Touch: Using Human Explanations
to Guide Model Attention [39.41947934589526]
Training with human explanations encourages models to attend more broadly across the sentences.
The supervised models attend to words humans believe are important, creating more robust and better performing NLI models.
arXiv Detail & Related papers (2021-04-16T14:45:35Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.