Generics are puzzling. Can language models find the missing piece?
- URL: http://arxiv.org/abs/2412.11318v1
- Date: Sun, 15 Dec 2024 21:30:21 GMT
- Title: Generics are puzzling. Can language models find the missing piece?
- Authors: Gustavo Cilleruelo Calderón, Emily Allaway, Barry Haddow, Alexandra Birch,
- Abstract summary: We study the implicit quantification and context-sensitivity of generics by leveraging language models as models of language.
We create ConGen, a dataset of 2873 naturally occurring generic and quantified sentences in context.
Our experiments show generics are more context-sensitive than determiner quantifiers and about 20% of naturally occurring generics we analyze express weak generalisations.
- Score: 70.14604603488178
- License:
- Abstract: Generic sentences express generalisations about the world without explicit quantification. Although generics are central to everyday communication, building a precise semantic framework has proven difficult, in part because speakers use generics to generalise properties with widely different statistical prevalence. In this work, we study the implicit quantification and context-sensitivity of generics by leveraging language models as models of language. We create ConGen, a dataset of 2873 naturally occurring generic and quantified sentences in context, and define p-acceptability, a metric based on surprisal that is sensitive to quantification. Our experiments show generics are more context-sensitive than determiner quantifiers and about 20% of naturally occurring generics we analyze express weak generalisations. We also explore how human biases in stereotypes can be observed in language models.
Related papers
- Evaluating Structural Generalization in Neural Machine Translation [13.880151307013318]
We construct SGET, a dataset covering various types of compositional generalization with control of words and sentence structures.
We show that neural machine translation models struggle more in structural generalization than in lexical generalization.
We also find different performance trends in semantic parsing and machine translation, which indicates the importance of evaluations across various tasks.
arXiv Detail & Related papers (2024-06-19T09:09:11Z) - GeniL: A Multilingual Dataset on Generalizing Language [19.43611224855484]
Current methods to assess presence of stereotypes in generated language rely on simple template or co-occurrence based measures.
We argue that understanding the sentential context is crucial for detecting instances of generalization.
We build GeniL, a multilingual dataset of over 50K sentences from 9 languages annotated for instances of generalizations.
arXiv Detail & Related papers (2024-04-08T20:58:06Z) - SLOG: A Structural Generalization Benchmark for Semantic Parsing [68.19511282584304]
The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions.
Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training, are often underrepresented.
We introduce SLOG, a semantic parsing dataset that extends COGS with 17 structural generalization cases.
arXiv Detail & Related papers (2023-10-23T15:39:09Z) - How Do In-Context Examples Affect Compositional Generalization? [86.57079616209474]
In this paper, we present CoFe, a test suite to investigate in-context compositional generalization.
We find that the compositional generalization performance can be easily affected by the selection of in-context examples.
Our systematic experiments indicate that in-context examples should be structurally similar to the test case, diverse from each other, and individually simple.
arXiv Detail & Related papers (2023-05-08T16:32:18Z) - Penguins Don't Fly: Reasoning about Generics through Instantiations and
Exceptions [73.56753518339247]
We present a novel framework informed by linguistic theory to generate exemplars -- specific cases when a generic holds true or false.
We generate 19k exemplars for 650 generics and show that our framework outperforms a strong GPT-3 baseline by 12.8 precision points.
arXiv Detail & Related papers (2022-05-23T22:45:53Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - Compositional Generalization and Natural Language Variation: Can a
Semantic Parsing Approach Handle Both? [27.590858384414567]
We ask: can we develop a semantic parsing approach that handles both natural language variation and compositional generalization?
We propose new train and test splits of non-synthetic datasets to better assess this capability.
We also propose NQG-T5, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-24T00:38:27Z) - Detecting and Understanding Generalization Barriers for Neural Machine
Translation [53.23463279153577]
This paper attempts to identify and understand generalization barrier words within an unseen input sentence.
We propose a principled definition of generalization barrier words and a modified version which is tractable in computation.
We then conduct extensive analyses on those detected generalization barrier words on both Zh$Leftrightarrow$En NIST benchmarks.
arXiv Detail & Related papers (2020-04-05T12:33:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.