No Strong Feelings One Way or Another: Re-operationalizing Neutrality in
Natural Language Inference
- URL: http://arxiv.org/abs/2306.09918v1
- Date: Fri, 16 Jun 2023 15:45:08 GMT
- Title: No Strong Feelings One Way or Another: Re-operationalizing Neutrality in
Natural Language Inference
- Authors: Animesh Nighojkar and Antonio Laverghetta Jr. and John Licato
- Abstract summary: Natural Language Inference (NLI) has been a cornerstone task in evaluating language models' inferential reasoning capabilities.
Standard three-way classification scheme used in NLI has well-known shortcomings in evaluating models' ability to capture the nuances of natural human reasoning.
We argue that the operationalization of the neutral label in current NLI datasets has low validity, is interpreted inconsistently, and that at least one important sense of neutrality is often ignored.
- Score: 6.485890157501745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural Language Inference (NLI) has been a cornerstone task in evaluating
language models' inferential reasoning capabilities. However, the standard
three-way classification scheme used in NLI has well-known shortcomings in
evaluating models' ability to capture the nuances of natural human reasoning.
In this paper, we argue that the operationalization of the neutral label in
current NLI datasets has low validity, is interpreted inconsistently, and that
at least one important sense of neutrality is often ignored. We uncover the
detrimental impact of these shortcomings, which in some cases leads to
annotation datasets that actually decrease performance on downstream tasks. We
compare approaches of handling annotator disagreement and identify flaws in a
recent NLI dataset that designs an annotator study based on a problematic
operationalization. Our findings highlight the need for a more refined
evaluation framework for NLI, and we hope to spark further discussion and
action in the NLP community.
Related papers
- Enhancing adversarial robustness in Natural Language Inference using explanations [41.46494686136601]
We cast the spotlight on the underexplored task of Natural Language Inference (NLI)
We validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation.
We research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models.
arXiv Detail & Related papers (2024-09-11T17:09:49Z) - Negation Triplet Extraction with Syntactic Dependency and Semantic Consistency [37.99421732397288]
SSENE is built based on a generative pretrained language model (PLM) of-Decoder architecture with a multi-task learning framework.
We have constructed a high-quality Chinese dataset NegComment based on the users' reviews from the real-world platform of Meituan.
arXiv Detail & Related papers (2024-04-15T14:28:33Z) - Uncertainty in Natural Language Processing: Sources, Quantification, and
Applications [56.130945359053776]
We provide a comprehensive review of uncertainty-relevant works in the NLP field.
We first categorize the sources of uncertainty in natural language into three types, including input, system, and output.
We discuss the challenges of uncertainty estimation in NLP and discuss potential future directions.
arXiv Detail & Related papers (2023-06-05T06:46:53Z) - Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
Negation [59.307534363825816]
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods.
arXiv Detail & Related papers (2022-10-06T23:39:01Z) - Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense
Reasoning [85.1541170468617]
This paper reconsiders the nature of commonsense reasoning and proposes a novel commonsense reasoning metric, Non-Replacement Confidence (NRC)
Our proposed novel method boosts zero-shot performance on two commonsense reasoning benchmark datasets and further seven commonsense question-answering datasets.
arXiv Detail & Related papers (2022-08-23T14:42:14Z) - Improving negation detection with negation-focused pre-training [58.32362243122714]
Negation is a common linguistic feature that is crucial in many language understanding tasks.
Recent work has shown that state-of-the-art NLP models underperform on samples containing negation.
We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking.
arXiv Detail & Related papers (2022-05-09T02:41:11Z) - Few-shot Named Entity Recognition with Cloze Questions [3.561183926088611]
We propose a simple and intuitive adaptation of Pattern-Exploiting Training (PET), a recent approach which combines the cloze-questions mechanism and fine-tuning for few-shot learning.
Our approach achieves considerably better performance than standard fine-tuning and comparable or improved results with respect to other few-shot baselines.
arXiv Detail & Related papers (2021-11-24T11:08:59Z) - Exploring Transitivity in Neural NLI Models through Veridicality [39.845425535943534]
We focus on the transitivity of inference relations, a fundamental property for systematically drawing inferences.
A model capturing transitivity can compose basic inference patterns and draw new inferences.
We find that current NLI models do not perform consistently well on transitivity inference tasks.
arXiv Detail & Related papers (2021-01-26T11:18:35Z) - Reliable Evaluations for Natural Language Inference based on a Unified
Cross-dataset Benchmark [54.782397511033345]
Crowd-sourced Natural Language Inference (NLI) datasets may suffer from significant biases like annotation artifacts.
We present a new unified cross-datasets benchmark with 14 NLI datasets and re-evaluate 9 widely-used neural network-based NLI models.
Our proposed evaluation scheme and experimental baselines could provide a basis to inspire future reliable NLI research.
arXiv Detail & Related papers (2020-10-15T11:50:12Z) - Discriminatively-Tuned Generative Classifiers for Robust Natural
Language Inference [59.62779187457773]
We propose a generative classifier for natural language inference (NLI)
We compare it to five baselines, including discriminative models and large-scale pretrained language representation models like BERT.
Experiments show that GenNLI outperforms both discriminative and pretrained baselines across several challenging NLI experimental settings.
arXiv Detail & Related papers (2020-10-08T04:44:00Z) - Neural Natural Language Inference Models Partially Embed Theories of
Lexical Entailment and Negation [14.431925736607043]
We present Monotonicity NLI (MoNLI), a new naturalistic dataset focused on lexical entailment and negation.
In behavioral evaluations, we find that models trained on general-purpose NLI datasets fail systematically on MoNLI examples containing negation.
In structural evaluations, we look for evidence that our top-performing BERT-based model has learned to implement the monotonicity algorithm behind MoNLI.
arXiv Detail & Related papers (2020-04-30T07:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.