"You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation
- URL: http://arxiv.org/abs/2310.17793v2
- Date: Mon, 11 Dec 2023 17:11:25 GMT
- Title: "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation
- Authors: Allyson Ettinger, Jena D. Hwang, Valentina Pyatkin, Chandra
Bhagavatula, Yejin Choi
- Abstract summary: We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure.
We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure.
Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
- Score: 60.863629647985526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) show amazing proficiency and fluency in the use
of language. Does this mean that they have also acquired insightful linguistic
knowledge about the language, to an extent that they can serve as an "expert
linguistic annotator"? In this paper, we examine the successes and limitations
of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning
structure, focusing on the Abstract Meaning Representation (AMR; Banarescu et
al. 2013) parsing formalism, which provides rich graphical representations of
sentence meaning structure while abstracting away from surface forms. We
compare models' analysis of this semantic structure across two settings: 1)
direct production of AMR parses based on zero- and few-shot prompts, and 2)
indirect partial reconstruction of AMR via metalinguistic natural language
queries (e.g., "Identify the primary event of this sentence, and the predicate
corresponding to that event."). Across these settings, we find that models can
reliably reproduce the basic format of AMR, and can often capture core event,
argument, and modifier structure -- however, model outputs are prone to
frequent and major errors, and holistic analysis of parse acceptability shows
that even with few-shot demonstrations, models have virtually 0% success in
producing fully accurate parses. Eliciting natural language responses produces
similar patterns of errors. Overall, our findings indicate that these models
out-of-the-box can capture aspects of semantic structure, but there remain key
limitations in their ability to support fully accurate semantic analyses or
parses.
Related papers
- MACT: Model-Agnostic Cross-Lingual Training for Discourse Representation Structure Parsing [4.536003573070846]
We introduce a cross-lingual training strategy for semantic representation parsing models.
It exploits the alignments between languages encoded in pre-trained language models.
Experiments show significant improvements in DRS clause and graph parsing in English, German, Italian and Dutch.
arXiv Detail & Related papers (2024-06-03T07:02:57Z) - Analyzing the Role of Semantic Representations in the Era of Large Language Models [104.18157036880287]
We investigate the role of semantic representations in the era of large language models (LLMs)
We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT.
We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions.
arXiv Detail & Related papers (2024-05-02T17:32:59Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge.
We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.
We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Multi-resolution Interpretation and Diagnostics Tool for Natural
Language Classifiers [0.0]
This paper aims to create more flexible model explainability summaries by segments of observation or clusters of words that are semantically related to each other.
In addition, we introduce a root cause analysis method for NLP models, by analyzing representative False Positive and False Negative examples from different segments.
arXiv Detail & Related papers (2023-03-06T22:59:02Z) - Analysing Discrete Self Supervised Speech Representation for Spoken
Language Modeling [21.19785690690611]
This work profoundly analyzes discrete self-supervised speech representations through the eyes of Generative Spoken Language Modeling.
We propose practical improvements to the discrete unit for the GSLM.
arXiv Detail & Related papers (2023-01-02T10:36:40Z) - Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on
a Syntactic Task [70.29624135819884]
We study the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates.
Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor is present.
arXiv Detail & Related papers (2022-04-14T11:33:15Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.