Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models
- URL: http://arxiv.org/abs/2311.04659v1
- Date: Wed, 8 Nov 2023 13:00:06 GMT
- Title: Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models
- Authors: Yiyuan Li, Rakesh R. Menon, Sayan Ghosh, Shashank Srivastava
- Abstract summary: We introduce QuRe, a crowd-sourced dataset of human-annotated generalized quantifiers in Wikipedia sentences featuring percentage-equipped predicates.
We explore quantifier comprehension in language models using PRESQUE, a framework that combines natural language inference and the Rational Speech Acts framework.
- Score: 22.757306452760112
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Generalized quantifiers (e.g., few, most) are used to indicate the
proportions predicates are satisfied (for example, some apples are red). One
way to interpret quantifier semantics is to explicitly bind these satisfactions
with percentage scopes (e.g., 30%-40% of apples are red). This approach can be
helpful for tasks like logic formalization and surface-form quantitative
reasoning (Gordon and Schubert, 2010; Roy et al., 2015). However, it remains
unclear if recent foundation models possess this ability, as they lack direct
training signals. To explore this, we introduce QuRe, a crowd-sourced dataset
of human-annotated generalized quantifiers in Wikipedia sentences featuring
percentage-equipped predicates. We explore quantifier comprehension in language
models using PRESQUE, a framework that combines natural language inference and
the Rational Speech Acts framework. Experimental results on the HVD dataset and
QuRe illustrate that PRESQUE, employing pragmatic reasoning, performs 20%
better than a literal reasoning baseline when predicting quantifier percentage
scopes, with no additional training required.
Related papers
- Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers [14.797001158310092]
We argue that distributional semantics models struggle with truth-conditional reasoning and symbolic processing.
Contrary to expectations, we find that LLMs align more closely with human judgements on exact quantifiers versus vague ones.
arXiv Detail & Related papers (2024-10-17T19:28:35Z) - "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure.
We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure.
Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z) - Evaluating statistical language models as pragmatic reasoners [39.72348730045737]
We evaluate the capacity of large language models to infer meanings of pragmatic utterances.
We find that LLMs can derive context-grounded, human-like distributions over the interpretations of several complex pragmatic utterances.
Results inform the inferential capacity of statistical language models, and their use in pragmatic and semantic parsing applications.
arXiv Detail & Related papers (2023-05-01T18:22:10Z) - ReCOGS: How Incidental Details of a Logical Form Overshadow an
Evaluation of Semantic Interpretation [63.33465936588327]
We propose a modified version of the compositional generalization benchmark COGS.
Our results reaffirm the importance of compositional generalization and careful benchmark task design.
arXiv Detail & Related papers (2023-03-24T00:01:24Z) - Generalized Quantifiers as a Source of Error in Multilingual NLU
Benchmarks [5.818232893255398]
We rely on Generalized Quantifier Theory for language-independent representations of the semantics of quantifier words.
We find that quantifiers are pervasive in NLU benchmarks, and their occurrence at test time is associated with performance drops.
Multilingual models also exhibit unsatisfying quantifier reasoning abilities, but not necessarily worse for non-English languages.
arXiv Detail & Related papers (2022-04-22T10:21:46Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Linguists Who Use Probabilistic Models Love Them: Quantification in
Functional Distributional Semantics [12.640283469603355]
I show how the previous formulation gives trivial truth values when a precise quantifier is used with vague predicates.
I propose an improved account, avoiding this problem by treating a vague predicate as a distribution over precise predicates.
I explain how the generic quantifier can be both pragmatically complex and yet computationally simpler than precise quantifiers.
arXiv Detail & Related papers (2020-06-04T16:48:45Z) - Words aren't enough, their order matters: On the Robustness of Grounding
Visual Referring Expressions [87.33156149634392]
We critically examine RefCOg, a standard benchmark for visual referring expression recognition.
We show that 83.7% of test instances do not require reasoning on linguistic structure.
We propose two methods, one based on contrastive learning and the other based on multi-task learning, to increase the robustness of ViLBERT.
arXiv Detail & Related papers (2020-05-04T17:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.