Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense
Reasoning
- URL: http://arxiv.org/abs/2208.11007v1
- Date: Tue, 23 Aug 2022 14:42:14 GMT
- Title: Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense
Reasoning
- Authors: Letian Peng, Zuchao Li, Hai Zhao
- Abstract summary: This paper reconsiders the nature of commonsense reasoning and proposes a novel commonsense reasoning metric, Non-Replacement Confidence (NRC)
Our proposed novel method boosts zero-shot performance on two commonsense reasoning benchmark datasets and further seven commonsense question-answering datasets.
- Score: 85.1541170468617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commonsense reasoning is an appealing topic in natural language processing
(NLP) as it plays a fundamental role in supporting the human-like actions of
NLP systems. With large-scale language models as the backbone, unsupervised
pre-training on numerous corpora shows the potential to capture commonsense
knowledge. Current pre-trained language model (PLM)-based reasoning follows the
traditional practice using perplexity metric. However, commonsense reasoning is
more than existing probability evaluation, which is biased by word frequency.
This paper reconsiders the nature of commonsense reasoning and proposes a novel
commonsense reasoning metric, Non-Replacement Confidence (NRC). In detail, it
works on PLMs according to the Replaced Token Detection (RTD) pre-training
objective in ELECTRA, in which the corruption detection objective reflects the
confidence on contextual integrity that is more relevant to commonsense
reasoning than existing probability. Our proposed novel method boosts zero-shot
performance on two commonsense reasoning benchmark datasets and further seven
commonsense question-answering datasets. Our analysis shows that pre-endowed
commonsense knowledge, especially for RTD-based PLMs, is essential in
downstream reasoning.
Related papers
- KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models [53.84677081899392]
KIEval is a Knowledge-grounded Interactive Evaluation framework for large language models.
It incorporates an LLM-powered "interactor" role for the first time to accomplish a dynamic contamination-resilient evaluation.
Extensive experiments on seven leading LLMs across five datasets validate KIEval's effectiveness and generalization.
arXiv Detail & Related papers (2024-02-23T01:30:39Z) - Sentiment Analysis through LLM Negotiations [58.67939611291001]
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round.
This paper introduces a multi-LLM negotiation framework for sentiment analysis.
arXiv Detail & Related papers (2023-11-03T12:35:29Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Causality-aware Concept Extraction based on Knowledge-guided Prompting [17.4086571624748]
Concepts benefit natural language understanding but are far from complete in existing knowledge graphs (KGs)
Recently, pre-trained language models (PLMs) have been widely used in text-based concept extraction.
We propose equipping the PLM-based extractor with a knowledge-guided prompt as an intervention to alleviate concept bias.
arXiv Detail & Related papers (2023-05-03T03:36:20Z) - A Call to Reflect on Evaluation Practices for Failure Detection in Image
Classification [0.491574468325115]
We present a large-scale empirical study for the first time enabling benchmarking confidence scoring functions.
The revelation of a simple softmax response baseline as the overall best performing method underlines the drastic shortcomings of current evaluation.
arXiv Detail & Related papers (2022-11-28T12:25:27Z) - Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text
Correspondence [45.9949173746044]
We show that large-size pre-trained language models (PLMs) do not satisfy the logical negation property (LNP)
We propose a novel intermediate training task, names meaning-matching, designed to directly learn a meaning-text correspondence.
We find that the task enables PLMs to learn lexical semantic information.
arXiv Detail & Related papers (2022-05-08T08:37:36Z) - Can Prompt Probe Pretrained Language Models? Understanding the Invisible
Risks from a Causal View [37.625078897220305]
Prompt-based probing has been widely used in evaluating the abilities of pretrained language models (PLMs)
This paper investigates the prompt-based probing from a causal view, highlights three critical biases which could induce biased results and conclusions, and proposes to conduct debiasing via causal intervention.
arXiv Detail & Related papers (2022-03-23T08:10:07Z) - Causal Inference Principles for Reasoning about Commonsense Causality [93.19149325083968]
Commonsense causality reasoning aims at identifying plausible causes and effects in natural language descriptions that are deemed reasonable by an average person.
Existing work usually relies on deep language models wholeheartedly, and is potentially susceptible to confounding co-occurrences.
Motivated by classical causal principles, we articulate the central question of CCR and draw parallels between human subjects in observational studies and natural languages.
We propose a novel framework, ROCK, to Reason O(A)bout Commonsense K(C)ausality, which utilizes temporal signals as incidental supervision.
arXiv Detail & Related papers (2022-01-31T06:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.