Modeling Event Plausibility with Consistent Conceptual Abstraction
- URL: http://arxiv.org/abs/2104.10247v1
- Date: Tue, 20 Apr 2021 21:08:32 GMT
- Title: Modeling Event Plausibility with Consistent Conceptual Abstraction
- Authors: Ian Porada, Kaheer Suleman, Adam Trischler, and Jackie Chi Kit Cheung
- Abstract summary: We show that Transformer-based plausibility models are markedly inconsistent across the conceptual classes of a lexical hierarchy.
We present a simple post-hoc method of forcing model consistency that improves correlation with human plausibility.
- Score: 29.69958315418181
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding natural language requires common sense, one aspect of which is
the ability to discern the plausibility of events. While distributional models
-- most recently pre-trained, Transformer language models -- have demonstrated
improvements in modeling event plausibility, their performance still falls
short of humans'. In this work, we show that Transformer-based plausibility
models are markedly inconsistent across the conceptual classes of a lexical
hierarchy, inferring that "a person breathing" is plausible while "a dentist
breathing" is not, for example. We find this inconsistency persists even when
models are softly injected with lexical knowledge, and we present a simple
post-hoc method of forcing model consistency that improves correlation with
human plausibility judgements.
Related papers
- A Psycholinguistic Evaluation of Language Models' Sensitivity to Argument Roles [0.06554326244334868]
We evaluate large language models' sensitivity to argument roles by replicating psycholinguistic studies on human argument role processing.
We find that language models are able to distinguish verbs that appear in plausible and implausible contexts, where plausibility is determined through the relation between the verb and its preceding arguments.
This indicates that language models' capacity to detect verb plausibility does not arise from the same mechanism that underlies human real-time sentence processing.
arXiv Detail & Related papers (2024-10-21T16:05:58Z) - Representation Surgery: Theory and Practice of Affine Steering [72.61363182652853]
Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text.
One natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model's representations.
This paper investigates the formal and empirical properties of steering functions.
arXiv Detail & Related papers (2024-02-15T00:20:30Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Longer Fixations, More Computation: Gaze-Guided Recurrent Neural
Networks [12.57650361978445]
Humans read texts at a varying pace, while machine learning models treat each token in the same way.
In this paper, we convert this intuition into a set of novel models with fixation-guided parallel RNNs or layers.
We find that, interestingly, the fixation duration predicted by neural networks bears some resemblance to humans' fixation.
arXiv Detail & Related papers (2023-10-31T21:32:11Z) - NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
Artificial Adversaries? [61.58261351116679]
We introduce a two-stage adversarial example generation framework (NaturalAdversaries) for natural language understanding tasks.
It is adaptable to both black-box and white-box adversarial attacks based on the level of access to the model parameters.
Our results indicate these adversaries generalize across domains, and offer insights for future research on improving robustness of neural text classification models.
arXiv Detail & Related papers (2022-11-08T16:37:34Z) - Lexical Generalization Improves with Larger Models and Longer Training [42.024050065980845]
We analyze the use of lexical overlaps in natural language inference, paraphrase detection, and reading comprehension.
We find that larger models are much less susceptible to adopting lexical overlaps.
arXiv Detail & Related papers (2022-10-23T09:20:11Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - BERT & Family Eat Word Salad: Experiments with Text Understanding [17.998891912502092]
We study the response of large models from the BERT family to incoherent inputs that should confuse any model that claims to understand natural language.
Experiments show that state-of-the-art models consistently fail to recognize them as ill-formed, and instead produce high confidence predictions on them.
We show that if models are explicitly trained to recognize invalid inputs, they can be robust to such attacks without a drop in performance.
arXiv Detail & Related papers (2021-01-10T01:32:57Z) - Concept Bottleneck Models [79.91795150047804]
State-of-the-art models today do not typically support the manipulation of concepts like "the existence of bone spurs"
We revisit the classic idea of first predicting concepts that are provided at training time, and then using these concepts to predict the label.
On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models.
arXiv Detail & Related papers (2020-07-09T07:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.