EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge
- URL: http://arxiv.org/abs/2507.03617v1
- Date: Fri, 04 Jul 2025 14:43:21 GMT
- Title: EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge
- Authors: Klim Zaporojets, Daniel Daza, Edoardo Barba, Ira Assent, Roberto Navigli, Paul Groth,
- Abstract summary: We propose a method for lifelong construction of a dataset consisting of Wikidata KG snapshots over time and Wikipedia passages.<n>The resulting dataset comprises 376K Wikipedia passages aligned with a total of 1.25M KG edits over 10 different snapshots of Wikidata from 2019 to 2025.
- Score: 48.36331802345063
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Knowledge Graphs (KGs) are structured knowledge repositories containing entities and relations between them. In this paper, we investigate the problem of automatically updating KGs over time with respect to the evolution of knowledge in unstructured textual sources. This problem requires identifying a wide range of update operations based on the state of an existing KG at a specific point in time. This contrasts with traditional information extraction pipelines, which extract knowledge from text independently of the current state of a KG. To address this challenge, we propose a method for lifelong construction of a dataset consisting of Wikidata KG snapshots over time and Wikipedia passages paired with the corresponding edit operations that they induce in a particular KG snapshot. The resulting dataset comprises 376K Wikipedia passages aligned with a total of 1.25M KG edits over 10 different snapshots of Wikidata from 2019 to 2025. Our experimental results highlight challenges in updating KG snapshots based on emerging textual knowledge, positioning the dataset as a valuable benchmark for future research. We will publicly release our dataset and model implementations.
Related papers
- Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking [56.27361644734853]
Knowledge Graph Question Answering systems rely on high-quality benchmarks to evaluate complex multi-hop reasoning.<n>Despite their widespread use, popular datasets such as WebQSP and CWQ suffer from critical quality issues.<n>We introduce KGQAGen, an LLM-in-the-loop framework that systematically resolves these pitfalls.<n>Our findings advocate for more rigorous benchmark construction and position KGQAGen as a scalable framework for advancing KGQA evaluation.
arXiv Detail & Related papers (2025-05-29T14:44:52Z) - Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema [60.42231674887294]
We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base.<n>We ground generation of KG with the authored ontology based on extracted relations to ensure consistency and interpretability.<n>Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs.
arXiv Detail & Related papers (2024-12-30T13:36:05Z) - Text-To-KG Alignment: Comparing Current Methods on Classification Tasks [2.191505742658975]
knowledge graphs (KG) provide dense and structured representations of factual information.
Recent work has focused on creating pipeline models that retrieve information from KGs as additional context.
It is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query.
arXiv Detail & Related papers (2023-06-05T13:45:45Z) - Editing Language Model-based Knowledge Graph Embeddings [40.12918266917595]
We propose a new task of editing language model-based Knowledge Graph embeddings in this paper.
This task is designed to facilitate rapid, data-efficient updates to KG embeddings without compromising the performance of other aspects.
We build four new datasets and evaluate several knowledge editing baselines demonstrating the limited ability of previous models to handle the proposed challenging task.
arXiv Detail & Related papers (2023-01-25T04:45:06Z) - BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from
Pretrained Language Models [65.51390418485207]
We propose a new approach of harvesting massive KGs of arbitrary relations from pretrained LMs.
With minimal input of a relation definition, the approach efficiently searches in the vast entity pair space to extract diverse accurate knowledge.
We deploy the approach to harvest KGs of over 400 new relations from different LMs.
arXiv Detail & Related papers (2022-06-28T19:46:29Z) - Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced
Language Model Pre-training [22.534866015730664]
We verbalize the entire English Wikidata KG.
We show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora.
arXiv Detail & Related papers (2020-10-23T22:14:50Z) - Language Models are Open Knowledge Graphs [75.48081086368606]
Recent deep language models automatically acquire knowledge from large-scale corpora via pre-training.
In this paper, we propose an unsupervised method to cast the knowledge contained within language models into KGs.
We show that KGs are constructed with a single forward pass of the pre-trained language models (without fine-tuning) over the corpora.
arXiv Detail & Related papers (2020-10-22T18:01:56Z) - Connecting the Dots: A Knowledgeable Path Generator for Commonsense
Question Answering [50.72473345911147]
This paper augments a general commonsense QA framework with a knowledgeable path generator.
By extrapolating over existing paths in a KG with a state-of-the-art language model, our generator learns to connect a pair of entities in text with a dynamic, and potentially novel, multi-hop relational path.
arXiv Detail & Related papers (2020-05-02T03:53:21Z) - Entity Type Prediction in Knowledge Graphs using Embeddings [2.7528170226206443]
Open Knowledge Graphs (such as DBpedia, Wikidata, YAGO) have been recognized as the backbone of diverse applications in the field of data mining and information retrieval.
Most of these KGs are mostly created either via an automated information extraction from snapshots or information accumulation provided by the users or using Wikipedias.
It has been observed that the type information of these KGs is often noisy, incomplete, and incorrect.
A multi-label classification approach is proposed in this work for entity typing using KG embeddings.
arXiv Detail & Related papers (2020-04-28T17:57:08Z) - Toward Subgraph-Guided Knowledge Graph Question Generation with Graph
Neural Networks [53.58077686470096]
Knowledge graph (KG) question generation (QG) aims to generate natural language questions from KGs and target answers.
In this work, we focus on a more realistic setting where we aim to generate questions from a KG subgraph and target answers.
arXiv Detail & Related papers (2020-04-13T15:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.