Realistic Re-evaluation of Knowledge Graph Completion Methods: An
Experimental Study
- URL: http://arxiv.org/abs/2003.08001v1
- Date: Wed, 18 Mar 2020 01:18:09 GMT
- Title: Realistic Re-evaluation of Knowledge Graph Completion Methods: An
Experimental Study
- Authors: Farahnaz Akrami (1), Mohammed Samiul Saeef (1), Qingheng Zhang (2),
Wei Hu (2), Chengkai Li (1) ((1) Department of Computer Science and
Engineering, University of Texas at Arlington, (2) State Key Laboratory for
Novel Software Technology, Nanjing University)
- Abstract summary: This paper is the first systematic study with the main objective of assessing the true effectiveness of embedding models.
Our experiment results show these models are much less accurate than what we used to perceive.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the active research area of employing embedding models for knowledge graph
completion, particularly for the task of link prediction, most prior studies
used two benchmark datasets FB15k and WN18 in evaluating such models. Most
triples in these and other datasets in such studies belong to reverse and
duplicate relations which exhibit high data redundancy due to semantic
duplication, correlation or data incompleteness. This is a case of excessive
data leakage---a model is trained using features that otherwise would not be
available when the model needs to be applied for real prediction. There are
also Cartesian product relations for which every triple formed by the Cartesian
product of applicable subjects and objects is a true fact. Link prediction on
the aforementioned relations is easy and can be achieved with even better
accuracy using straightforward rules instead of sophisticated embedding models.
A more fundamental defect of these models is that the link prediction scenario,
given such data, is non-existent in the real-world. This paper is the first
systematic study with the main objective of assessing the true effectiveness of
embedding models when the unrealistic triples are removed. Our experiment
results show these models are much less accurate than what we used to perceive.
Their poor accuracy renders link prediction a task without truly effective
automated solution. Hence, we call for re-investigation of possible effective
approaches.
Related papers
- Beyond Accuracy: Ensuring Correct Predictions With Correct Rationales [10.397502254316645]
We propose a two-phase scheme to ensure double-correct predictions.
First, we curate a new dataset that offers structured rationales for visual recognition tasks.
Second, we propose a rationale-informed optimization method to guide the model in disentangling and localizing visual evidence.
arXiv Detail & Related papers (2024-10-31T18:33:39Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Towards Causal Deep Learning for Vulnerability Detection [31.59558109518435]
We introduce do calculus based causal learning to software engineering models.
Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance.
arXiv Detail & Related papers (2023-10-12T00:51:06Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - An Empirical Study on Data Leakage and Generalizability of Link
Prediction Models for Issues and Commits [7.061740334417124]
LinkFormer preserves and improves the accuracy of existing predictions.
Our findings support that to simulate real-world scenarios effectively, researchers must maintain the temporal flow of data.
arXiv Detail & Related papers (2022-11-01T10:54:26Z) - Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return.
Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions.
The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z) - Debiasing Skin Lesion Datasets and Models? Not So Fast [17.668005682385175]
Models learned from data risk learning biases from that same data.
When models learn spurious correlations not found in real-world situations, their deployment for critical tasks, such as medical decisions, can be catastrophic.
We find out that, despite interesting results that point to promising future research, current debiasing methods are not ready to solve the bias issue for skin-lesion models.
arXiv Detail & Related papers (2020-04-23T21:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.