Rethinking Knowledge Graph Evaluation Under the Open-World Assumption
- URL: http://arxiv.org/abs/2209.08858v1
- Date: Mon, 19 Sep 2022 09:01:29 GMT
- Title: Rethinking Knowledge Graph Evaluation Under the Open-World Assumption
- Authors: Haotong Yang, Zhouchen Lin, Muhan Zhang
- Abstract summary: Most knowledge graphs (KGs) are incomplete, which motivates one important research topic on automatically complementing knowledge graphs.
Treating all unknown triplets as false is called the closed-world assumption.
In this paper, we study KGC evaluation under a more realistic setting, namely the open-world assumption.
- Score: 65.20527611711697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most knowledge graphs (KGs) are incomplete, which motivates one important
research topic on automatically complementing knowledge graphs. However,
evaluation of knowledge graph completion (KGC) models often ignores the
incompleteness -- facts in the test set are ranked against all unknown triplets
which may contain a large number of missing facts not included in the KG yet.
Treating all unknown triplets as false is called the closed-world assumption.
This closed-world assumption might negatively affect the fairness and
consistency of the evaluation metrics. In this paper, we study KGC evaluation
under a more realistic setting, namely the open-world assumption, where unknown
triplets are considered to include many missing facts not included in the
training or test sets. For the currently most used metrics such as mean
reciprocal rank (MRR) and Hits@K, we point out that their behavior may be
unexpected under the open-world assumption. Specifically, with not many missing
facts, their numbers show a logarithmic trend with respect to the true strength
of the model, and thus, the metric increase could be insignificant in terms of
reflecting the true model improvement. Further, considering the variance, we
show that the degradation in the reported numbers may result in incorrect
comparisons between different models, where stronger models may have lower
metric numbers. We validate the phenomenon both theoretically and
experimentally. Finally, we suggest possible causes and solutions for this
problem. Our code and data are available at
https://github.com/GraphPKU/Open-World-KG .
Related papers
- Empowering Graph Invariance Learning with Deep Spurious Infomax [27.53568333416706]
We introduce a novel graph invariance learning paradigm, which induces a robust and general inductive bias.
EQuAD shows stable and enhanced performance across different degrees of bias in synthetic datasets and challenging real-world datasets up to $31.76%$.
arXiv Detail & Related papers (2024-07-13T14:18:47Z) - Normalizing Flow-based Neural Process for Few-Shot Knowledge Graph
Completion [69.55700751102376]
Few-shot knowledge graph completion (FKGC) aims to predict missing facts for unseen relations with few-shot associated facts.
Existing FKGC methods are based on metric learning or meta-learning, which often suffer from the out-of-distribution and overfitting problems.
In this paper, we propose a normalizing flow-based neural process for few-shot knowledge graph completion (NP-FKGC)
arXiv Detail & Related papers (2023-04-17T11:42:28Z) - A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic,
and Multimodal [57.8455911689554]
Knowledge graph reasoning (KGR) aims to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs)
It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering, recommendation systems, and etc.
arXiv Detail & Related papers (2022-12-12T08:40:04Z) - A Study on the Evaluation of Generative Models [19.18642459565609]
Implicit generative models, which do not return likelihood values, have become prevalent in recent years.
In this work, we study the evaluation metrics of generative models by generating a high-quality synthetic dataset.
Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably.
arXiv Detail & Related papers (2022-06-22T09:27:31Z) - Deconfounded Training for Graph Neural Networks [98.06386851685645]
We present a new paradigm of decon training (DTP) that better mitigates the confounding effect and latches on the critical information.
Specifically, we adopt the attention modules to disentangle the critical subgraph and trivial subgraph.
It allows GNNs to capture a more reliable subgraph whose relation with the label is robust across different distributions.
arXiv Detail & Related papers (2021-12-30T15:22:35Z) - MPLR: a novel model for multi-target learning of logical rules for
knowledge graph reasoning [5.499688003232003]
We study the problem of learning logic rules for reasoning on knowledge graphs for completing missing factual triplets.
We propose a model called MPLR that improves the existing models to fully use training data and multi-target scenarios are considered.
Experimental results empirically demonstrate that our MPLR model outperforms state-of-the-art methods on five benchmark datasets.
arXiv Detail & Related papers (2021-12-12T09:16:00Z) - Are Missing Links Predictable? An Inferential Benchmark for Knowledge
Graph Completion [79.07695173192472]
InferWiki improves upon existing benchmarks in inferential ability, assumptions, and patterns.
Each testing sample is predictable with supportive data in the training set.
In experiments, we curate two settings of InferWiki varying in sizes and structures, and apply the construction process on CoDEx as comparative datasets.
arXiv Detail & Related papers (2021-08-03T09:51:15Z) - Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs.
We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables.
We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z) - Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To? [0.0]
We argue that the standard evaluation metric, which consists in measuring the overall in-domain accuracy, is misleading.
We propose the GQA-OOD benchmark designed to overcome these concerns.
arXiv Detail & Related papers (2020-06-09T08:50:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.