Related papers: Wikidated 1.0: An Evolving Knowledge Graph Dataset of Wikidata's Revision History

Wikidated 1.0: An Evolving Knowledge Graph Dataset of Wikidata's Revision History

URL: http://arxiv.org/abs/2112.05003v1
Date: Thu, 9 Dec 2021 15:54:03 GMT
Title: Wikidated 1.0: An Evolving Knowledge Graph Dataset of Wikidata's Revision History
Authors: Lukas Schmelzeisen, Corina Dima, Steffen Staab
Abstract summary: We present Wikidated 1.0, a dataset of Wikidata's full revision history. To the best of our knowledge, it constitutes the first large dataset of an evolving knowledge graph.
Score: 5.727994421498849
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Wikidata is the largest general-interest knowledge base that is openly available. It is collaboratively edited by thousands of volunteer editors and has thus evolved considerably since its inception in 2012. In this paper, we present Wikidated 1.0, a dataset of Wikidata's full revision history, which encodes changes between Wikidata revisions as sets of deletions and additions of RDF triples. To the best of our knowledge, it constitutes the first large dataset of an evolving knowledge graph, a recently emerging research subject in the Semantic Web community. We introduce the methodology for generating Wikidated 1.0 from dumps of Wikidata, discuss its implementation and limitations, and present statistical characteristics of the dataset.

Related papers

Diagnosing and Mitigating Semantic Inconsistencies in Wikidata's Classification Hierarchy [1.4705700441788643]
Wikidata is the largest open knowledge graph on the web, encompassing over 120 million entities.<n>This study proposes and applies a novel validation method to confirm the presence of classification errors and over-generalized subclass links.<n>We develop a system that allows users to inspect the taxonomic relationships of arbitrary Wikidata entities.
arXiv Detail & Related papers (2025-11-07T02:09:00Z)
EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge [48.36331802345063]
We propose a method for lifelong construction of a dataset consisting of Wikidata KG snapshots over time and Wikipedia passages.<n>The resulting dataset comprises 376K Wikipedia passages aligned with a total of 1.25M KG edits over 10 different snapshots of Wikidata from 2019 to 2025.
arXiv Detail & Related papers (2025-07-04T14:43:21Z)
Towards a Brazilian History Knowledge Graph [50.26735825937335]
We construct a knowledge graph for Brazilian history based on the Brazilian Dictionary of Historical Biographies (DHBB) and Wikipedia/Wikidata. We show that many terms/entities described in the DHBB do not have corresponding concepts (or Q items) in Wikidata.
arXiv Detail & Related papers (2024-03-28T22:05:32Z)
Wikidata as a seed for Web Extraction [4.273966905160028]
We present a framework that is able to identify and extract new facts that are published under multiple Web domains. We take inspiration from ideas that are used to extract facts from textual collections and adapt them to extract facts from Web pages. Our experiments show that we can achieve a mean performance of 84.07 at F1-score.
arXiv Detail & Related papers (2024-01-15T16:35:52Z)
Leveraging Wikidata's edit history in knowledge graph refinement tasks [77.34726150561087]
edit history represents the process in which the community reaches some kind of fuzzy and distributed consensus. We build a dataset containing the edit history of every instance from the 100 most important classes in Wikidata. We propose and evaluate two new methods to leverage this edit history information in knowledge graph embedding models for type prediction tasks.
arXiv Detail & Related papers (2022-10-27T14:32:45Z)
Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z)
WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles. The dataset consists of over 80k English samples on 6987 topics. Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z)
Enriching Wikidata with Linked Open Data [4.311189028205597]
Current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality.
arXiv Detail & Related papers (2022-07-01T01:50:24Z)
Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling. We use the profile to query the indexed search engine to retrieve candidate entities. Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z)
Survey on English Entity Linking on Wikidata [3.8289963781051415]
Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure.
arXiv Detail & Related papers (2021-12-03T16:02:42Z)
Assessing the quality of sources in Wikidata across languages: a hybrid approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages. We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z)
Commonsense Knowledge in Wikidata [3.8359194344969807]
This paper investigates whether Wikidata con-tains commonsense knowledge which is complementary to existing commonsense sources. We map the relations of Wikidata to ConceptNet, which we also leverage to integrate Wikidata-CS into an existing consolidated commonsense graph.
arXiv Detail & Related papers (2020-08-18T18:23:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.