Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers
- URL: http://arxiv.org/abs/2410.13984v1
- Date: Thu, 17 Oct 2024 19:28:35 GMT
- Title: Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers
- Authors: Zhang Enyan, Zewei Wang, Michael A. Lepori, Ellie Pavlick, Helena Aparicio,
- Abstract summary: We argue that distributional semantics models struggle with truth-conditional reasoning and symbolic processing.
Contrary to expectations, we find that LLMs align more closely with human judgements on exact quantifiers versus vague ones.
- Score: 14.797001158310092
- License:
- Abstract: Distributional semantics is the linguistic theory that a word's meaning can be derived from its distribution in natural language (i.e., its use). Language models are commonly viewed as an implementation of distributional semantics, as they are optimized to capture the statistical features of natural language. It is often argued that distributional semantics models should excel at capturing graded/vague meaning based on linguistic conventions, but struggle with truth-conditional reasoning and symbolic processing. We evaluate this claim with a case study on vague (e.g. "many") and exact (e.g. "more than half") quantifiers. Contrary to expectations, we find that, across a broad range of models of various types, LLMs align more closely with human judgements on exact quantifiers versus vague ones. These findings call for a re-evaluation of the assumptions underpinning what distributional semantics models are, as well as what they can capture.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - How Abstract Is Linguistic Generalization in Large Language Models?
Experiments with Argument Structure [2.530495315660486]
We investigate the degree to which pre-trained Transformer-based large language models represent relationships between contexts.
We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts.
However, LLMs fail at generalizations between related contexts that have not been observed during pre-training.
arXiv Detail & Related papers (2023-11-08T18:58:43Z) - Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models [22.757306452760112]
We introduce QuRe, a crowd-sourced dataset of human-annotated generalized quantifiers in Wikipedia sentences featuring percentage-equipped predicates.
We explore quantifier comprehension in language models using PRESQUE, a framework that combines natural language inference and the Rational Speech Acts framework.
arXiv Detail & Related papers (2023-11-08T13:00:06Z) - "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure.
We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure.
Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z) - Meaning Representations from Trajectories in Autoregressive Models [106.63181745054571]
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text.
This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model.
We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle.
arXiv Detail & Related papers (2023-10-23T04:35:58Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Evaluating statistical language models as pragmatic reasoners [39.72348730045737]
We evaluate the capacity of large language models to infer meanings of pragmatic utterances.
We find that LLMs can derive context-grounded, human-like distributions over the interpretations of several complex pragmatic utterances.
Results inform the inferential capacity of statistical language models, and their use in pragmatic and semantic parsing applications.
arXiv Detail & Related papers (2023-05-01T18:22:10Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Montague semantics and modifier consistency measurement in neural
language models [1.6799377888527685]
This work proposes a methodology for measuring compositional behavior in contemporary language models.
Specifically, we focus on adjectival modifier phenomena in adjective-noun phrases.
Our experimental results indicate that current neural language models behave according to the expected linguistic theories to a limited extent only.
arXiv Detail & Related papers (2022-10-10T18:43:16Z) - Testing Pre-trained Language Models' Understanding of Distributivity via
Causal Mediation Analysis [13.07356367140208]
We introduce DistNLI, a new diagnostic dataset for natural language inference.
We find that the extent of models' understanding is associated with model size and vocabulary size.
arXiv Detail & Related papers (2022-09-11T00:33:28Z) - A comprehensive comparative evaluation and analysis of Distributional
Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous.
We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.