How Abstract Is Linguistic Generalization in Large Language Models?
Experiments with Argument Structure
- URL: http://arxiv.org/abs/2311.04900v1
- Date: Wed, 8 Nov 2023 18:58:43 GMT
- Title: How Abstract Is Linguistic Generalization in Large Language Models?
Experiments with Argument Structure
- Authors: Michael Wilson and Jackson Petty and Robert Frank
- Abstract summary: We investigate the degree to which pre-trained Transformer-based large language models represent relationships between contexts.
We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts.
However, LLMs fail at generalizations between related contexts that have not been observed during pre-training.
- Score: 2.530495315660486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models are typically evaluated on their success at predicting the
distribution of specific words in specific contexts. Yet linguistic knowledge
also encodes relationships between contexts, allowing inferences between word
distributions. We investigate the degree to which pre-trained Transformer-based
large language models (LLMs) represent such relationships, focusing on the
domain of argument structure. We find that LLMs perform well in generalizing
the distribution of a novel noun argument between related contexts that were
seen during pre-training (e.g., the active object and passive subject of the
verb spray), succeeding by making use of the semantically-organized structure
of the embedding space for word embeddings. However, LLMs fail at
generalizations between related contexts that have not been observed during
pre-training, but which instantiate more abstract, but well-attested structural
generalizations (e.g., between the active object and passive subject of an
arbitrary verb). Instead, in this case, LLMs show a bias to generalize based on
linear order. This finding points to a limitation with current models and
points to a reason for which their training is data-intensive.s reported here
are available at https://github.com/clay-lab/structural-alternations.
Related papers
- Black Big Boxes: Do Language Models Hide a Theory of Adjective Order? [5.395055685742631]
In English and other languages, multiple adjectives in a complex noun phrase show intricate ordering patterns that have been a target of much linguistic theory.
We review existing hypotheses designed to explain Adjective Order Preferences (AOPs) in humans and develop a setup to study AOPs in language models.
We find that all models' predictions are much closer to human AOPs than predictions generated by factors identified in theoretical linguistics.
arXiv Detail & Related papers (2024-07-02T10:29:09Z) - Learning from Natural Language Explanations for Generalizable Entity Matching [19.978468744557173]
We re-cast entity matching as a conditional generation task as opposed to binary classification.
This enables us to "distill" LLM reasoning into smaller entity matching models via natural language explanations.
arXiv Detail & Related papers (2024-06-13T17:08:58Z) - In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax [36.98247762224868]
In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks.
Do models infer the underlying structure of the task defined by the context, or do they rely on superficial generalizations that only generalize to identically distributed examples?
In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs.
The variance is explained more by the composition of the pre-training corpus and supervision methods than by model size.
arXiv Detail & Related papers (2023-11-13T23:52:43Z) - SLOG: A Structural Generalization Benchmark for Semantic Parsing [68.19511282584304]
The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions.
Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training, are often underrepresented.
We introduce SLOG, a semantic parsing dataset that extends COGS with 17 structural generalization cases.
arXiv Detail & Related papers (2023-10-23T15:39:09Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Evaluating statistical language models as pragmatic reasoners [39.72348730045737]
We evaluate the capacity of large language models to infer meanings of pragmatic utterances.
We find that LLMs can derive context-grounded, human-like distributions over the interpretations of several complex pragmatic utterances.
Results inform the inferential capacity of statistical language models, and their use in pragmatic and semantic parsing applications.
arXiv Detail & Related papers (2023-05-01T18:22:10Z) - Semantic Role Labeling Meets Definition Modeling: Using Natural Language
to Describe Predicate-Argument Structures [104.32063681736349]
We present an approach to describe predicate-argument structures using natural language definitions instead of discrete labels.
Our experiments and analyses on PropBank-style and FrameNet-style, dependency-based and span-based SRL also demonstrate that a flexible model with an interpretable output does not necessarily come at the expense of performance.
arXiv Detail & Related papers (2022-12-02T11:19:16Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.