AMR Similarity Metrics from Principles
- URL: http://arxiv.org/abs/2001.10929v2
- Date: Thu, 17 Sep 2020 09:34:56 GMT
- Title: AMR Similarity Metrics from Principles
- Authors: Juri Opitz and Letitia Parcalabescu and Anette Frank
- Abstract summary: We establish criteria that enable researchers to perform a principled assessment of metrics comparing meaning representations like AMR.
We propose a novel metric S$2$match that is more benevolent to only very slight meaning deviations and targets the fulfilment of all established criteria.
- Score: 21.915057426589748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Different metrics have been proposed to compare Abstract Meaning
Representation (AMR) graphs. The canonical Smatch metric (Cai and Knight, 2013)
aligns the variables of two graphs and assesses triple matches. The recent
SemBleu metric (Song and Gildea, 2019) is based on the machine-translation
metric Bleu (Papineni et al., 2002) and increases computational efficiency by
ablating the variable-alignment.
In this paper, i) we establish criteria that enable researchers to perform a
principled assessment of metrics comparing meaning representations like AMR;
ii) we undertake a thorough analysis of Smatch and SemBleu where we show that
the latter exhibits some undesirable properties. For example, it does not
conform to the identity of indiscernibles rule and introduces biases that are
hard to control; iii) we propose a novel metric S$^2$match that is more
benevolent to only very slight meaning deviations and targets the fulfilment of
all established criteria. We assess its suitability and show its advantages
over Smatch and SemBleu.
Related papers
- Rematch: Robust and Efficient Matching of Local Knowledge Graphs to Improve Structural and Semantic Similarity [6.1980259703476674]
We introduce a novel AMR similarity metric, rematch, alongside a new evaluation for structural similarity called RARE.
Rematch ranks second in structural similarity; and first in semantic similarity by 1--5 percentage points on the STS-B and SICK-R benchmarks.
arXiv Detail & Related papers (2024-04-02T17:33:00Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization [3.5297361401370044]
The ROUGE metric has been criticized for its lack of semantic awareness and its ignorance about the ranking quality of the extractive summarizer.
Previous research has introduced a gain-based automated metric called Sem-nCG that addresses these issues.
We propose a redundancy-aware Sem-nCG metric and demonstrate how it can be used to evaluate model summaries against multiple references.
arXiv Detail & Related papers (2023-08-04T11:47:19Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - Joint Metrics Matter: A Better Standard for Trajectory Forecasting [67.1375677218281]
Multi-modal trajectory forecasting methods evaluate using single-agent metrics (marginal metrics)
Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group.
We present the first comprehensive evaluation of state-of-the-art trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate.
arXiv Detail & Related papers (2023-05-10T16:27:55Z) - SMART: Sentences as Basic Units for Text Evaluation [48.5999587529085]
In this paper, we introduce a new metric called SMART to mitigate such limitations.
We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences.
Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics.
arXiv Detail & Related papers (2022-08-01T17:58:05Z) - SBERT studies Meaning Representations: Decomposing Sentence Embeddings
into Explainable AMR Meaning Features [22.8438857884398]
We create similarity metrics that are highly effective, while also providing an interpretable rationale for their rating.
Our approach works in two steps: We first select AMR graph metrics that measure meaning similarity of sentences with respect to key semantic facets.
Second, we employ these metrics to induce Semantically Structured Sentence BERT embeddings, which are composed of different meaning aspects captured in different sub-spaces.
arXiv Detail & Related papers (2022-06-14T17:37:18Z) - Weisfeiler-Leman in the BAMBOO: Novel AMR Graph Metrics and a Benchmark
for AMR Graph Similarity [12.375561840897742]
We propose new AMR similarity metrics that unify the strengths of previous metrics, while mitigating their weaknesses.
Specifically, our new metrics are able to match contextualized substructures and induce n:m alignments between their nodes.
We introduce a Benchmark for AMR Metrics based on Overt Objectives (BAMBOO) to support empirical assessment of graph-based MR similarity metrics.
arXiv Detail & Related papers (2021-08-26T17:58:54Z) - Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.