What does the Failure to Reason with "Respectively" in Zero/Few-Shot
Settings Tell Us about Language Models?
- URL: http://arxiv.org/abs/2305.19597v1
- Date: Wed, 31 May 2023 06:45:09 GMT
- Title: What does the Failure to Reason with "Respectively" in Zero/Few-Shot
Settings Tell Us about Language Models?
- Authors: Ruixiang Cui, Seolhwa Lee, Daniel Hershcovich, Anders S{\o}gaard
- Abstract summary: We show how language models (LMs) reason with respective readings from two perspectives: syntactic-semantic and commonsense-world knowledge.
We show that fine-tuned NLI models struggle with understanding such readings without explicit supervision.
- Score: 5.431715810374623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans can effortlessly understand the coordinate structure of sentences such
as "Niels Bohr and Kurt Cobain were born in Copenhagen and Seattle,
respectively". In the context of natural language inference (NLI), we examine
how language models (LMs) reason with respective readings (Gawron and Kehler,
2004) from two perspectives: syntactic-semantic and commonsense-world
knowledge. We propose a controlled synthetic dataset WikiResNLI and a naturally
occurring dataset NatResNLI to encompass various explicit and implicit
realizations of "respectively". We show that fine-tuned NLI models struggle
with understanding such readings without explicit supervision. While few-shot
learning is easy in the presence of explicit cues, longer training is required
when the reading is evoked implicitly, leaving models to rely on common sense
inferences. Furthermore, our fine-grained analysis indicates models fail to
generalize across different constructions. To conclude, we demonstrate that LMs
still lag behind humans in generalizing to the long tail of linguistic
constructions.
Related papers
- "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure.
We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure.
Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z) - Large Language Models Are Partially Primed in Pronoun Interpretation [6.024776891570197]
We investigate whether large language models (LLMs) display human-like referential biases using stimuli and procedures from real psycholinguistic experiments.
Recent psycholinguistic studies suggest that humans adapt their referential biases with recent exposure to referential patterns.
We find that InstructGPT adapts its pronominal interpretations in response to the frequency of referential patterns in the local discourse.
arXiv Detail & Related papers (2023-05-26T13:30:48Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Machine Reading, Fast and Slow: When Do Models "Understand" Language? [59.897515617661874]
We investigate the behavior of reading comprehension models with respect to two linguistic'skills': coreference resolution and comparison.
We find that for comparison (but not coreference) the systems based on larger encoders are more likely to rely on the 'right' information.
arXiv Detail & Related papers (2022-09-15T16:25:44Z) - Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on
a Syntactic Task [70.29624135819884]
We study the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates.
Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor is present.
arXiv Detail & Related papers (2022-04-14T11:33:15Z) - Provable Limitations of Acquiring Meaning from Ungrounded Form: What
will Future Language Models Understand? [87.20342701232869]
We investigate the abilities of ungrounded systems to acquire meaning.
We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.
We find that assertions enable semantic emulation if all expressions in the language are referentially transparent.
However, if the language uses non-transparent patterns like variable binding, we show that emulation can become an uncomputable problem.
arXiv Detail & Related papers (2021-04-22T01:00:17Z) - The Singleton Fallacy: Why Current Critiques of Language Models Miss the
Point [3.096615629099618]
We discuss the current critique against neural network-based Natural Language Understanding (NLU) solutions known as language models.
We will argue that there are many different types of language use, meaning and understanding, and that (current) language models are build with the explicit purpose of acquiring and representing one type of structural understanding of language.
arXiv Detail & Related papers (2021-02-08T16:12:36Z) - Discourse structure interacts with reference but not syntax in neural
language models [17.995905582226463]
We study the ability of language models (LMs) to learn interactions between different linguistic representations.
We find that, contrary to humans, implicit causality only influences LM behavior for reference, not syntax.
Our results suggest that LM behavior can contradict not only learned representations of discourse but also syntactic agreement.
arXiv Detail & Related papers (2020-10-10T03:14:00Z) - An Analysis of the Utility of Explicit Negative Examples to Improve the
Syntactic Abilities of Neural Language Models [32.183409062294466]
We explore the utilities of explicit negative examples in training neural language models.
We find that even with our direct learning signals the models still suffer from resolving agreement across an object-relative clause.
arXiv Detail & Related papers (2020-04-06T07:47:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.