AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing
Evaluation Suite
- URL: http://arxiv.org/abs/2312.03480v1
- Date: Wed, 6 Dec 2023 13:19:56 GMT
- Title: AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing
Evaluation Suite
- Authors: Jonas Groschwitz, Shay B. Cohen, Lucia Donatelli, Meaghan Fowlie
- Abstract summary: Granular AMR Parsing Evaluation Suite (GrAPES)
We present the Granular AMR Parsing Evaluation Suite (GrAPES)
GrAPES reveals in depth the abilities and shortcomings of current AMRs.
- Score: 18.674172788583967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the Granular AMR Parsing Evaluation Suite (GrAPES), a challenge
set for Abstract Meaning Representation (AMR) parsing with accompanying
evaluation metrics. AMR parsers now obtain high scores on the standard AMR
evaluation metric Smatch, close to or even above reported inter-annotator
agreement. But that does not mean that AMR parsing is solved; in fact, human
evaluation in previous work indicates that current parsers still quite
frequently make errors on node labels or graph structure that substantially
distort sentence meaning. Here, we provide an evaluation suite that tests AMR
parsers on a range of phenomena of practical, technical, and linguistic
interest. Our 36 categories range from seen and unseen labels, to structural
generalization, to coreference. GrAPES reveals in depth the abilities and
shortcomings of current AMR parsers.
Related papers
- Widely Interpretable Semantic Representation: Frameless Meaning
Representation for Broader Applicability [10.710058244695128]
This paper presents a novel semantic representation, WISeR, that overcomes challenges for Abstract Meaning Representation (AMR)
Despite its strengths, AMR is not easily applied to languages or domains without predefined semantic frames.
We create a new corpus of 1K English dialogue sentences in both WISeR and AMR WISeR.
arXiv Detail & Related papers (2023-09-12T17:44:40Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - Leveraging Denoised Abstract Meaning Representation for Grammatical
Error Correction [53.55440811942249]
Grammatical Error Correction (GEC) is the task of correcting errorful sentences into grammatically correct, semantically consistent, and coherent sentences.
We propose the AMR-GEC, a seq-to-seq model that incorporates denoised AMR as additional knowledge.
arXiv Detail & Related papers (2023-07-05T09:06:56Z) - An AMR-based Link Prediction Approach for Document-level Event Argument
Extraction [51.77733454436013]
Recent works have introduced Abstract Meaning Representation (AMR) for Document-level Event Argument Extraction (Doc-level EAE)
This work reformulates EAE as a link prediction problem on AMR graphs.
We propose a novel graph structure, Tailored AMR Graph (TAG), which compresses less informative subgraphs and edge types, integrates span information, and highlights surrounding events in the same document.
arXiv Detail & Related papers (2023-05-30T16:07:48Z) - Cross-domain Generalization for AMR Parsing [30.34105706152887]
We evaluate five representative AMRs on five domains and analyze challenges to cross-domain AMR parsing.
Based on our observation, we investigate two approaches to reduce the domain distribution divergence of text and AMR features.
arXiv Detail & Related papers (2022-10-22T13:24:13Z) - Retrofitting Multilingual Sentence Embeddings with Abstract Meaning
Representation [70.58243648754507]
We introduce a new method to improve existing multilingual sentence embeddings with Abstract Meaning Representation (AMR)
Compared with the original textual input, AMR is a structured semantic representation that presents the core concepts and relations in a sentence explicitly and unambiguously.
Experiment results show that retrofitting multilingual sentence embeddings with AMR leads to better state-of-the-art performance on both semantic similarity and transfer tasks.
arXiv Detail & Related papers (2022-10-18T11:37:36Z) - Better Smatch = Better Parser? AMR evaluation is not so simple anymore [22.8438857884398]
We conduct an analysis of two popular and strong AMRs that reach quality levels on par with human IAA.
Considering high-performances, better Smatch scores may not necessarily indicate consistently better parsing quality.
arXiv Detail & Related papers (2022-10-12T17:57:48Z) - Inducing and Using Alignments for Transition-based AMR Parsing [51.35194383275297]
We propose a neural aligner for AMR that learns node-to-word alignments without relying on complex pipelines.
We attain a new state-of-the art for gold-only trained models, matching silver-trained performance without the need for beam search on AMR3.0.
arXiv Detail & Related papers (2022-05-03T12:58:36Z) - Probabilistic, Structure-Aware Algorithms for Improved Variety,
Accuracy, and Coverage of AMR Alignments [9.74672460306765]
We present algorithms for aligning components of Abstract Meaning Representation (AMR) spans in English sentences.
We leverage unsupervised learning in combination with graphs, taking the best of both worlds from previous AMR.
Our approach covers a wider variety of AMR substructures than previously considered, achieves higher coverage of nodes and edges, and does so with higher accuracy.
arXiv Detail & Related papers (2021-06-10T18:46:32Z) - Making Better Use of Bilingual Information for Cross-Lingual AMR Parsing [88.08581016329398]
We argue that the misprediction of concepts is due to the high relevance between English tokens and AMR concepts.
We introduce bilingual input, namely the translated texts as well as non-English texts, in order to enable the model to predict more accurate concepts.
arXiv Detail & Related papers (2021-06-09T05:14:54Z) - Towards a Decomposable Metric for Explainable Evaluation of Text
Generation from AMR [22.8438857884398]
AMR systems are typically evaluated using metrics that compare the generated texts to reference texts from which the input meaning representations were constructed.
We show that besides well-known issues from which such metrics suffer, an additional problem arises when applying these metrics for AMR-to-text evaluation.
We show that fulfillment of both principles offers benefits for AMR-to-text evaluation, including explainability of scores.
arXiv Detail & Related papers (2020-08-20T11:25:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.