ProVe: A Pipeline for Automated Provenance Verification of Knowledge
Graphs against Textual Sources
- URL: http://arxiv.org/abs/2210.14846v1
- Date: Wed, 26 Oct 2022 16:47:36 GMT
- Title: ProVe: A Pipeline for Automated Provenance Verification of Knowledge
Graphs against Textual Sources
- Authors: Gabriel Amaral, Odinaldo Rodrigues, Elena Simperl
- Abstract summary: ProVe is a pipelined approach that automatically verifies whether a Knowledge Graph triple is supported by text extracted from its documented provenance.
ProVe is evaluated on a Wikidata dataset, achieving promising results overall and excellent performance on the binary classification task of detecting support from provenance.
- Score: 5.161088104035106
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Knowledge Graphs are repositories of information that gather data from a
multitude of domains and sources in the form of semantic triples, serving as a
source of structured data for various crucial applications in the modern web
landscape, from Wikipedia infoboxes to search engines. Such graphs mainly serve
as secondary sources of information and depend on well-documented and
verifiable provenance to ensure their trustworthiness and usability. However,
their ability to systematically assess and assure the quality of this
provenance, most crucially whether it properly supports the graph's
information, relies mainly on manual processes that do not scale with size.
ProVe aims at remedying this, consisting of a pipelined approach that
automatically verifies whether a Knowledge Graph triple is supported by text
extracted from its documented provenance. ProVe is intended to assist
information curators and consists of four main steps involving rule-based
methods and machine learning models: text extraction, triple verbalisation,
sentence selection, and claim verification. ProVe is evaluated on a Wikidata
dataset, achieving promising results overall and excellent performance on the
binary classification task of detecting support from provenance, with 87.5%
accuracy and 82.9% F1-macro on text-rich sources. The evaluation data and
scripts used in this paper are available on GitHub and Figshare.
Related papers
- Triplètoile: Extraction of Knowledge from Microblogging Text [7.848242781280095]
We propose an enhanced information extraction pipeline tailored to the extraction of a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms.
Our pipeline leverages dependency parsing and classifies entity relations in an unsupervised manner through hierarchical clustering over word embeddings.
We provide a use case on extracting semantic triples from a corpus of 100 thousand tweets about digital transformation and publicly release the generated knowledge graph.
arXiv Detail & Related papers (2024-08-27T09:35:13Z) - FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection [54.37159298632628]
FineFake is a multi-domain knowledge-enhanced benchmark for fake news detection.
FineFake encompasses 16,909 data samples spanning six semantic topics and eight platforms.
The entire FineFake project is publicly accessible as an open-source repository.
arXiv Detail & Related papers (2024-03-30T14:39:09Z) - Information Extraction in Domain and Generic Documents: Findings from
Heuristic-based and Data-driven Approaches [0.0]
Information extraction plays important role in natural language processing.
Document genre and length influence on IE tasks.
No single method demonstrated overwhelming performance in both tasks.
arXiv Detail & Related papers (2023-06-30T20:43:27Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - FabKG: A Knowledge graph of Manufacturing Science domain utilizing
structured and unconventional unstructured knowledge source [1.2597961235465307]
We develop knowledge graphs based upon entity and relation data for both commercial and educational uses.
We propose a novel crowdsourcing method for KG creation by leveraging student notes.
We have created a knowledge graph containing 65000+ triples using all data sources.
arXiv Detail & Related papers (2022-05-24T02:32:04Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - Contribution of Conceptual Modeling to Enhancing Historians' Intuition
-Application to Prosopography [0.0]
We propose a process that automatically supports historians' intuition in prosopography.
The contribution is threefold: a conceptual data model, a process model, and a set of rules combining the reliability of sources and the credibility of information.
arXiv Detail & Related papers (2020-11-26T13:21:36Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.