Related papers: Creating and Querying Personalized Versions of Wikidata on a Laptop

Creating and Querying Personalized Versions of Wikidata on a Laptop

URL: http://arxiv.org/abs/2108.07119v2
Date: Wed, 18 Aug 2021 06:31:15 GMT
Title: Creating and Querying Personalized Versions of Wikidata on a Laptop
Authors: Hans Chalupsky, Pedro Szekely, Filip Ilievski, Daniel Garijo and Kartik Shenoy
Abstract summary: This paper introduces KGTK Kypher, a query language and processor that allows users to create personalized variants of Wikidata on a laptop. We present several use cases that illustrate the types of analyses that Kypher enables users to run on the full Wikidata KG on a laptop.
Score: 0.7449724123186383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Application developers today have three choices for exploiting the knowledge present in Wikidata: they can download the Wikidata dumps in JSON or RDF format, they can use the Wikidata API to get data about individual entities, or they can use the Wikidata SPARQL endpoint. None of these methods can support complex, yet common, query use cases, such as retrieval of large amounts of data or aggregations over large fractions of Wikidata. This paper introduces KGTK Kypher, a query language and processor that allows users to create personalized variants of Wikidata on a laptop. We present several use cases that illustrate the types of analyses that Kypher enables users to run on the full Wikidata KG on a laptop, combining data from external resources such as DBpedia. The Kypher queries for these use cases run much faster on a laptop than the equivalent SPARQL queries on a Wikidata clone running on a powerful server with 24h time-out limits.

Related papers

Automating SPARQL Query Translations between DBpedia and Wikidata [8.105516788827453]
We focus on translations between the DBpedia and Wikidata KG, and later on DBLP and OpenAlex KG.<n>We find that the performance varies markedly across models and prompting strategies.
arXiv Detail & Related papers (2025-07-14T08:23:25Z)
KIF: A Wikidata-Based Framework for Integrating Heterogeneous Knowledge Sources [0.45141207783683707]
We present a Wikidata-based framework, called KIF, for virtually integrating heterogeneous knowledge sources. KIF is written in Python and is released as open-source.
arXiv Detail & Related papers (2024-03-15T13:46:36Z)
Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata [6.716263690738313]
This paper presents WikiWebQuestions, a high-quality question answering benchmark for Wikidata. It consists of real-world data with SPARQL. We modify SPARQL to use the unique domain and property names instead of their IDs.
arXiv Detail & Related papers (2023-05-23T16:20:43Z)
Semantic Parsing for Conversational Question Answering over Knowledge Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof. We present two different semantic parsing approaches and highlight the challenges of the task. Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z)
Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z)
Enriching Wikidata with Linked Open Data [4.311189028205597]
Current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality.
arXiv Detail & Related papers (2022-07-01T01:50:24Z)
Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling. We use the profile to query the indexed search engine to retrieve candidate entities. Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z)
A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases [67.33560134350427]
TempQA-WD is a benchmark dataset for temporal reasoning. It is based on Wikidata, which is the most frequently curated, openly available knowledge base.
arXiv Detail & Related papers (2022-01-15T08:49:09Z)
Survey on English Entity Linking on Wikidata [3.8289963781051415]
Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure.
arXiv Detail & Related papers (2021-12-03T16:02:42Z)
A Chinese Multi-type Complex Questions Answering Dataset over Wikidata [45.31495982252219]
Complex Knowledge Base Question Answering is a popular area of research in the past decade. Recent public datasets have led to encouraging results in this field, but are mostly limited to English. Few state-of-the-art KBQA models are trained on Wikidata, one of the most popular real-world knowledge bases. We propose CLC-QuAD, the first large scale complex Chinese semantic parsing dataset over Wikidata to address these challenges.
arXiv Detail & Related papers (2021-11-11T07:39:16Z)
Assessing the quality of sources in Wikidata across languages: a hybrid approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages. We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z)
Wikidata on MARS [0.20305676256390934]
Multi-attributed relational structures (MARSs) have been proposed as a formal data model for generalized property graphs. MARPL is a useful rule-based logic in which to write inference rules over property graphs. Wikidata can be modelled in an extended MARS that adds the (imprecise) datatypes of Wikidata.
arXiv Detail & Related papers (2020-08-14T22:58:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.