A Unified Framework for Rank-based Evaluation Metrics for Link
Prediction in Knowledge Graphs
- URL: http://arxiv.org/abs/2203.07544v1
- Date: Mon, 14 Mar 2022 23:09:46 GMT
- Title: A Unified Framework for Rank-based Evaluation Metrics for Link
Prediction in Knowledge Graphs
- Authors: Charles Tapley Hoyt, Max Berrendorf, Mikhail Gaklin, Volker Tresp,
Benjamin M. Gyori
- Abstract summary: Link prediction task on knowledge graphs without explicit negative triples motivates the usage of rank-based metrics.
We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory.
We propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.
- Score: 19.822126244784133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The link prediction task on knowledge graphs without explicit negative
triples in the training data motivates the usage of rank-based metrics. Here,
we review existing rank-based metrics and propose desiderata for improved
metrics to address lack of interpretability and comparability of existing
metrics to datasets of different sizes and properties. We introduce a simple
theoretical framework for rank-based metrics upon which we investigate two
avenues for improvements to existing metrics via alternative aggregation
functions and concepts from probability theory. We finally propose several new
rank-based metrics that are more easily interpreted and compared accompanied by
a demonstration of their usage in a benchmarking of knowledge graph embedding
models.
Related papers
- A Unified View of Evaluation Metrics for Structured Prediction [41.29492827464339]
We present a conceptual framework that unifies evaluation metrics for different structured prediction tasks.
Our framework requires representing the outputs of these tasks as objects of certain data types.
We show that new metrics can be naturally derived in a bottom-up way based on an output structure.
arXiv Detail & Related papers (2023-10-20T20:02:02Z) - KGxBoard: Explainable and Interactive Leaderboard for Evaluation of
Knowledge Graph Completion Models [76.01814380927507]
KGxBoard is an interactive framework for performing fine-grained evaluation on meaningful subsets of the data.
In our experiments, we highlight the findings with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.
arXiv Detail & Related papers (2022-08-23T15:11:45Z) - Classification Performance Metric Elicitation and its Applications [5.5637552942511155]
Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications.
This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences.
arXiv Detail & Related papers (2022-08-19T03:57:17Z) - SMART: Sentences as Basic Units for Text Evaluation [48.5999587529085]
In this paper, we introduce a new metric called SMART to mitigate such limitations.
We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences.
Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics.
arXiv Detail & Related papers (2022-08-01T17:58:05Z) - Cross-Domain Few-Shot Graph Classification [7.23389716633927]
We study the problem of few-shot graph classification across domains with nonequivalent feature spaces.
We propose an attention-based graph encoder that uses three congruent views of graphs, one contextual and two topological views.
We show that when coupled with metric-based meta-learning frameworks, the proposed encoder achieves the best average meta-test classification accuracy.
arXiv Detail & Related papers (2022-01-20T16:16:30Z) - Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand [117.62186420147563]
We propose a generalization of leaderboards, bidimensional leaderboards (Billboards)
Unlike conventional unidimensional leaderboards that sort submitted systems by predetermined metrics, a Billboard accepts both generators and evaluation metrics as competing entries.
We demonstrate that a linear ensemble of a few diverse metrics sometimes substantially outperforms existing metrics in isolation.
arXiv Detail & Related papers (2021-12-08T06:34:58Z) - Weisfeiler-Leman in the BAMBOO: Novel AMR Graph Metrics and a Benchmark
for AMR Graph Similarity [12.375561840897742]
We propose new AMR similarity metrics that unify the strengths of previous metrics, while mitigating their weaknesses.
Specifically, our new metrics are able to match contextualized substructures and induce n:m alignments between their nodes.
We introduce a Benchmark for AMR Metrics based on Overt Objectives (BAMBOO) to support empirical assessment of graph-based MR similarity metrics.
arXiv Detail & Related papers (2021-08-26T17:58:54Z) - REAM$\sharp$: An Enhancement Approach to Reference-based Evaluation
Metrics for Open-domain Dialog Generation [63.46331073232526]
We present an enhancement approach to Reference-based EvAluation Metrics for open-domain dialogue systems.
A prediction model is designed to estimate the reliability of the given reference set.
We show how its predicted results can be helpful to augment the reference set, and thus improve the reliability of the metric.
arXiv Detail & Related papers (2021-05-30T10:04:13Z) - Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.