Related papers: Enriching Wikidata with Linked Open Data

Enriching Wikidata with Linked Open Data

URL: http://arxiv.org/abs/2207.00143v1
Date: Fri, 1 Jul 2022 01:50:24 GMT
Title: Enriching Wikidata with Linked Open Data
Authors: Bohui Zhang, Filip Ilievski, Pedro Szekely
Abstract summary: Current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality.
Score: 4.311189028205597
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users' needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.

Related papers

KIF: A Wikidata-Based Framework for Integrating Heterogeneous Knowledge Sources [0.45141207783683707]
We present a Wikidata-based framework, called KIF, for virtually integrating heterogeneous knowledge sources. KIF is written in Python and is released as open-source.
arXiv Detail & Related papers (2024-03-15T13:46:36Z)
Leveraging Wikidata's edit history in knowledge graph refinement tasks [77.34726150561087]
edit history represents the process in which the community reaches some kind of fuzzy and distributed consensus. We build a dataset containing the edit history of every instance from the 100 most important classes in Wikidata. We propose and evaluate two new methods to leverage this edit history information in knowledge graph embedding models for type prediction tasks.
arXiv Detail & Related papers (2022-10-27T14:32:45Z)
Does Wikidata Support Analogical Reasoning? [17.68704739786042]
We investigate whether the knowledge in Wikidata supports analogical reasoning. We show that Wikidata can be used to create data for analogy classification. We devise a set of metrics to guide an automatic method for extracting analogies from Wikidata.
arXiv Detail & Related papers (2022-10-02T20:46:52Z)
Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling. We use the profile to query the indexed search engine to retrieve candidate entities. Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z)
Survey on English Entity Linking on Wikidata [3.8289963781051415]
Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure.
arXiv Detail & Related papers (2021-12-03T16:02:42Z)
Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching. AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage. Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z)
Open Domain Question Answering over Virtual Documents: A Unified Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA) Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources. We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z)
Assessing the quality of sources in Wikidata across languages: a hybrid approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages. We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z)
Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG) It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains. Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
Commonsense Knowledge in Wikidata [3.8359194344969807]
This paper investigates whether Wikidata con-tains commonsense knowledge which is complementary to existing commonsense sources. We map the relations of Wikidata to ConceptNet, which we also leverage to integrate Wikidata-CS into an existing consolidated commonsense graph.
arXiv Detail & Related papers (2020-08-18T18:23:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.