Out-of-Vocabulary Entities in Link Prediction
- URL: http://arxiv.org/abs/2105.12524v1
- Date: Wed, 26 May 2021 12:58:18 GMT
- Title: Out-of-Vocabulary Entities in Link Prediction
- Authors: Caglar Demir and Axel-Cyrille Ngonga Ngomo
- Abstract summary: Link prediction is often used as a proxy to evaluate the quality of embeddings.
As benchmarks are crucial for the fair comparison of algorithms, ensuring their quality is tantamount to providing a solid ground for developing better solutions.
We provide an implementation of an approach for spotting and removing such entities and provide corrected versions of the datasets WN18RR, FB15K-237, and YAGO3-10.
- Score: 1.9036571490366496
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Knowledge graph embedding techniques are key to making knowledge graphs
amenable to the plethora of machine learning approaches based on vector
representations. Link prediction is often used as a proxy to evaluate the
quality of these embeddings. Given that the creation of benchmarks for link
prediction is a time-consuming endeavor, most work on the subject matter uses
only a few benchmarks. As benchmarks are crucial for the fair comparison of
algorithms, ensuring their quality is tantamount to providing a solid ground
for developing better solutions to link prediction and ipso facto embedding
knowledge graphs. First studies of benchmarks pointed to limitations pertaining
to information leaking from the development to the test fragments of some
benchmark datasets. We spotted a further common limitation of three of the
benchmarks commonly used for evaluating link prediction approaches:
out-of-vocabulary entities in the test and validation sets. We provide an
implementation of an approach for spotting and removing such entities and
provide corrected versions of the datasets WN18RR, FB15K-237, and YAGO3-10. Our
experiments on the corrected versions of WN18RR, FB15K-237, and YAGO3-10
suggest that the measured performance of state-of-the-art approaches is altered
significantly with p-values <1%, <1.4%, and <1%, respectively. Overall,
state-of-the-art approaches gain on average absolute $3.29 \pm 0.24\%$ in all
metrics on WN18RR. This means that some of the conclusions achieved in previous
works might need to be revisited. We provide an open-source implementation of
our experiments and corrected datasets at at
https://github.com/dice-group/OOV-In-Link-Prediction.
Related papers
- New Directions in Text Classification Research: Maximizing The Performance of Sentiment Classification from Limited Data [0.0]
A benchmark dataset is provided for training and testing data on the issue of Kaesang Pangarep's appointment as Chairman of PSI.
The official score used is the F1-score, which balances precision and recall among the three classes, positive, negative, and neutral.
Both scoring (baseline and optimized) use the SVM method, which is widely reported as the state-of-the-art in conventional machine learning methods.
arXiv Detail & Related papers (2024-07-08T05:42:29Z) - KERMIT: Knowledge Graph Completion of Enhanced Relation Modeling with Inverse Transformation [19.31783654838732]
We use large language models to generate coherent descriptions, bridging the semantic gap between queries and answers.
We also utilize inverse relations to create a symmetric graph, thereby providing augmented training samples for KGC.
Our approach achieves a 4.2% improvement in Hit@1 on WN18RR and a 3.4% improvement in Hit@3 on FB15k-237, demonstrating superior performance.
arXiv Detail & Related papers (2023-09-26T09:03:25Z) - Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls
and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph.
A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task.
New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Cross Version Defect Prediction with Class Dependency Embeddings [17.110933073074584]
We use the Class Dependency Network (CDN) as another predictor for defects, combined with static code metrics.
Our approach uses network embedding techniques to leverage CDN information without having to build the metrics manually.
arXiv Detail & Related papers (2022-12-29T18:24:39Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Are Missing Links Predictable? An Inferential Benchmark for Knowledge
Graph Completion [79.07695173192472]
InferWiki improves upon existing benchmarks in inferential ability, assumptions, and patterns.
Each testing sample is predictable with supportive data in the training set.
In experiments, we curate two settings of InferWiki varying in sizes and structures, and apply the construction process on CoDEx as comparative datasets.
arXiv Detail & Related papers (2021-08-03T09:51:15Z) - COM2SENSE: A Commonsense Reasoning Benchmark with Complementary
Sentences [21.11065466376105]
Commonsense reasoning is intuitive for humans but has been a long-term challenge for artificial intelligence (AI)
Recent advancements in pretrained language models have shown promising results on several commonsense benchmark datasets.
We introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements.
arXiv Detail & Related papers (2021-06-02T06:31:55Z) - Evaluating Models' Local Decision Boundaries via Contrast Sets [119.38387782979474]
We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data.
We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets.
Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets.
arXiv Detail & Related papers (2020-04-06T14:47:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.