PROPRES: Investigating the Projectivity of Presupposition with Various
Triggers and Environments
- URL: http://arxiv.org/abs/2312.08755v1
- Date: Thu, 14 Dec 2023 09:07:57 GMT
- Title: PROPRES: Investigating the Projectivity of Presupposition with Various
Triggers and Environments
- Authors: Daiki Asami and Saku Sugawara
- Abstract summary: We introduce a new dataset, projectivity of presupposition (PROPRES)
Our human evaluation reveals that humans exhibit variable projectivity in some cases.
Our findings suggest that probing studies on pragmatic inferences should take extra care of the human judgment variability.
- Score: 13.896697187967547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: What makes a presupposition of an utterance -- information taken for granted
by its speaker -- different from other pragmatic inferences such as an
entailment is projectivity (e.g., the negative sentence the boy did not stop
shedding tears presupposes the boy had shed tears before). The projectivity may
vary depending on the combination of presupposition triggers and environments.
However, prior natural language understanding studies fail to take it into
account as they either use no human baseline or include only negation as an
entailment-canceling environment to evaluate models' performance. The current
study attempts to reconcile these issues. We introduce a new dataset,
projectivity of presupposition (PROPRES, which includes 12k premise-hypothesis
pairs crossing six triggers involving some lexical variety with five
environments. Our human evaluation reveals that humans exhibit variable
projectivity in some cases. However, the model evaluation shows that the
best-performed model, DeBERTa, does not fully capture it. Our findings suggest
that probing studies on pragmatic inferences should take extra care of the
human judgment variability and the combination of linguistic items.
Related papers
- FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations [33.24762796282484]
We present FairPair, an evaluation framework for assessing differential treatment that occurs during ordinary usage.
Unlike prior work, our method factors in the inherent variability that comes from the generation process itself by measuring the sampling variability.
arXiv Detail & Related papers (2024-04-09T21:09:22Z) - Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - Using Artificial Populations to Study Psychological Phenomena in Neural
Models [0.0]
Investigation of cognitive behavior in language models must be conducted in an appropriate population for the results to be meaningful.
We leverage work in uncertainty estimation in a novel approach to efficiently construct experimental populations.
We provide theoretical grounding in the uncertainty estimation literature and motivation from current cognitive work regarding language models.
arXiv Detail & Related papers (2023-08-15T20:47:51Z) - Uncertainty-Aware Unlikelihood Learning Improves Generative Aspect
Sentiment Quad Prediction [52.05304897163256]
We propose a template-agnostic method to control the token-level generation.
Specifically, we introduce Monte Carlo dropout to understand the built-in uncertainty of pre-trained language models.
We further propose marginalized unlikelihood learning to suppress the uncertainty-aware mistake tokens.
arXiv Detail & Related papers (2023-06-01T07:49:06Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Syntactic Surprisal From Neural Models Predicts, But Underestimates,
Human Processing Difficulty From Syntactic Ambiguities [19.659811811023374]
We propose a method for estimating syntactic predictability from a language model.
We find that treating syntactic predictability independently from lexical predictability indeed results in larger estimates of garden path effects.
Our results support the hypothesis that predictability is not the only factor responsible for the processing cost associated with garden path sentences.
arXiv Detail & Related papers (2022-10-21T18:30:56Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z) - Towards Understanding Sample Variance in Visually Grounded Language
Generation: Evaluations and Observations [67.4375210552593]
We design experiments to understand an important but often ignored problem in visually grounded language generation.
Given that humans have different utilities and visual attention, how will the sample variance in multi-reference datasets affect the models' performance?
We show that it is of paramount importance to report variance in experiments; that human-generated references could vary drastically in different datasets/tasks, revealing the nature of each task.
arXiv Detail & Related papers (2020-10-07T20:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.