Experimental Pragmatics with Machines: Testing LLM Predictions for the Inferences of Plain and Embedded Disjunctions
- URL: http://arxiv.org/abs/2405.05776v1
- Date: Thu, 9 May 2024 13:54:15 GMT
- Title: Experimental Pragmatics with Machines: Testing LLM Predictions for the Inferences of Plain and Embedded Disjunctions
- Authors: Polina Tsvilodub, Paul Marty, Sonia Ramotowska, Jacopo Romoli, Michael Franke,
- Abstract summary: We focus on three inferences of plain and embedded disjunctions, and compare them with regular scalar implicatures.
We investigate this comparison from the novel perspective of the predictions of state-of-the-art large language models.
The results of our best performing models mostly align with those of humans, both in the large differences we find between those inferences and implicatures, as well as in fine-grained distinctions among different aspects of those inferences.
- Score: 4.753535328327316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human communication is based on a variety of inferences that we draw from sentences, often going beyond what is literally said. While there is wide agreement on the basic distinction between entailment, implicature, and presupposition, the status of many inferences remains controversial. In this paper, we focus on three inferences of plain and embedded disjunctions, and compare them with regular scalar implicatures. We investigate this comparison from the novel perspective of the predictions of state-of-the-art large language models, using the same experimental paradigms as recent studies investigating the same inferences with humans. The results of our best performing models mostly align with those of humans, both in the large differences we find between those inferences and implicatures, as well as in fine-grained distinctions among different aspects of those inferences.
Related papers
- Leveraging Human Production-Interpretation Asymmetries to Test LLM Cognitive Plausibility [7.183662547358301]
We examine whether large language models process language similarly to humans.
We find that some LLMs do quantitatively and qualitatively reflect human-like asymmetries between production and interpretation.
arXiv Detail & Related papers (2025-03-21T23:25:42Z) - Large Language Models Often Say One Thing and Do Another [49.22262396351797]
We develop a novel evaluation benchmark called the Words and Deeds Consistency Test (WDCT)
The benchmark establishes a strict correspondence between word-based and deed-based questions across different domains.
The evaluation results reveal a widespread inconsistency between words and deeds across different LLMs and domains.
arXiv Detail & Related papers (2025-03-10T07:34:54Z) - Causal Inference with Large Language Model: A Survey [5.651037052334014]
Causal inference has been a pivotal challenge across diverse domains such as medicine and economics.
Recent advancements in natural language processing (NLP) have introduced promising opportunities for traditional causal inference tasks.
arXiv Detail & Related papers (2024-09-15T18:43:11Z) - Statistical Uncertainty in Word Embeddings: GloVe-V [35.04183792123882]
We introduce a method to obtain approximate, easy-to-use, and scalable reconstruction error variance estimates for GloVe.
To demonstrate the value of embeddings with variance (GloVe-V), we illustrate how our approach enables principled hypothesis testing in core word embedding tasks.
arXiv Detail & Related papers (2024-06-18T00:35:02Z) - Predictive Churn with the Set of Good Models [61.00058053669447]
This paper explores connections between two seemingly unrelated concepts of predictive inconsistency.
The first, known as predictive multiplicity, occurs when models that perform similarly produce conflicting predictions for individual samples.
The second concept, predictive churn, examines the differences in individual predictions before and after model updates.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization [66.4659448305396]
This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization gap.
We demonstrate, for the first time, that generalization gaps and the robustness of the embedding space vary significantly across LMs.
arXiv Detail & Related papers (2024-02-02T12:59:27Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Studying and improving reasoning in humans and machines [0.0]
We investigate and compare reasoning in large language models (LLM) and humans.
Our results show that most of the included models presented reasoning errors akin to those frequently ascribed to error-prone, induce-based human reasoning.
arXiv Detail & Related papers (2023-09-21T21:02:05Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Feedback in Imitation Learning: Confusion on Causality and Covariate
Shift [12.93527098342393]
We argue that conditioning policies on previous actions leads to a dramatic divergence between "held out" error and performance of the learner in situ.
We analyze existing benchmarks used to test imitation learning approaches.
We find, in a surprising contrast with previous literature, that naive behavioral cloning provides excellent results.
arXiv Detail & Related papers (2021-02-04T20:18:56Z) - Multi-sense embeddings through a word sense disambiguation process [2.2344764434954256]
Most Suitable Sense.
(MSSA) disambiguates and annotates each word by its specific sense, considering the semantic effects of its context.
We test our approach on six different benchmarks for the word similarity task, showing that our approach can produce state-of-the-art results.
arXiv Detail & Related papers (2021-01-21T16:22:34Z) - Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
of Vision-and-Language BERTs [57.74359320513427]
Methods have been proposed for pretraining vision and language BERTs to tackle challenges at the intersection of these two key areas of AI.
We study the differences between these two categories, and show how they can be unified under a single theoretical framework.
We conduct controlled experiments to discern the empirical differences between five V&L BERTs.
arXiv Detail & Related papers (2020-11-30T18:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.