Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways
Out
- URL: http://arxiv.org/abs/2102.02307v1
- Date: Wed, 3 Feb 2021 21:47:37 GMT
- Title: Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways
Out
- Authors: Peiran Yao and Denilson Barbosa
- Abstract summary: We propose an active typing error detection algorithm that maximizes the utilization of both gold and noisy labels.
The outcomes of this study provide guidelines for researchers to use noisy factual KGs.
- Score: 11.534085606272242
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Factual knowledge graphs (KGs) such as DBpedia and Wikidata have served as
part of various downstream tasks and are also widely adopted by artificial
intelligence research communities as benchmark datasets. However, we found
these KGs to be surprisingly noisy. In this study, we question the quality of
these KGs, where the typing error rate is estimated to be 27% for
coarse-grained types on average, and even 73% for certain fine-grained types.
In pursuit of solutions, we propose an active typing error detection algorithm
that maximizes the utilization of both gold and noisy labels. We also
comprehensively discuss and compare unsupervised, semi-supervised, and
supervised paradigms to deal with typing errors in factual KGs. The outcomes of
this study provide guidelines for researchers to use noisy factual KGs. To help
practitioners deploy the techniques and conduct further research, we published
our code and data.
Related papers
- Learning Rules from KGs Guided by Language Models [48.858741745144044]
Rule learning methods can be applied to predict potentially missing facts.
Ranking of rules is especially challenging over highly incomplete or biased KGs.
With the recent rise of Language Models (LMs) several works have claimed that LMs can be used as alternative means for KG completion.
arXiv Detail & Related papers (2024-09-12T09:27:36Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic,
and Multimodal [57.8455911689554]
Knowledge graph reasoning (KGR) aims to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs)
It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering, recommendation systems, and etc.
arXiv Detail & Related papers (2022-12-12T08:40:04Z) - Contrastive Knowledge Graph Error Detection [11.637359888052014]
We propose a novel framework - ContrAstive knowledge Graph Error Detection (CAGED)
CAGED introduces contrastive learning into KG learning and provides a novel way of modeling KG.
It outperforms state-of-the-art methods in KG error detection.
arXiv Detail & Related papers (2022-11-18T05:01:19Z) - KGxBoard: Explainable and Interactive Leaderboard for Evaluation of
Knowledge Graph Completion Models [76.01814380927507]
KGxBoard is an interactive framework for performing fine-grained evaluation on meaningful subsets of the data.
In our experiments, we highlight the findings with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.
arXiv Detail & Related papers (2022-08-23T15:11:45Z) - Trustworthy Knowledge Graph Completion Based on Multi-sourced Noisy Data [35.938323660176145]
We propose a new trustworthy method that exploits facts for a knowledge graph based on multi-sourced noisy data and existing facts in the KG.
Specifically, we introduce a graph neural network with a holistic scoring function to judge the plausibility of facts with various value types.
We present a truth inference model that incorporates data source qualities into the fact scoring function, and design a semi-supervised learning way to infer the truths from heterogeneous values.
arXiv Detail & Related papers (2022-01-21T07:59:16Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - Handling Missing Data with Graph Representation Learning [62.59831675688714]
We propose GRAPE, a graph-based framework for feature imputation as well as label prediction.
Under GRAPE, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task.
Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks.
arXiv Detail & Related papers (2020-10-30T17:59:13Z) - Efficient Knowledge Graph Validation via Cross-Graph Representation
Learning [40.570585195713704]
noisy facts are unavoidably introduced into Knowledge Graphs that could be caused by automatic extraction.
We propose a cross-graph representation learning framework, i.e., CrossVal, which can leverage an external KG to validate the facts in the target KG efficiently.
arXiv Detail & Related papers (2020-08-16T20:51:17Z) - Entity Type Prediction in Knowledge Graphs using Embeddings [2.7528170226206443]
Open Knowledge Graphs (such as DBpedia, Wikidata, YAGO) have been recognized as the backbone of diverse applications in the field of data mining and information retrieval.
Most of these KGs are mostly created either via an automated information extraction from snapshots or information accumulation provided by the users or using Wikipedias.
It has been observed that the type information of these KGs is often noisy, incomplete, and incorrect.
A multi-label classification approach is proposed in this work for entity typing using KG embeddings.
arXiv Detail & Related papers (2020-04-28T17:57:08Z) - Guiding Graph Embeddings using Path-Ranking Methods for Error Detection
innoisy Knowledge Graphs [0.0]
This work presents various mainstream approaches and proposes a hybrid and modular methodology for the task.
We compare different methods on two benchmarks and one real-world biomedical publications dataset, showcasing the potential of our approach.
arXiv Detail & Related papers (2020-02-19T11:04:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.