Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets
- URL: http://arxiv.org/abs/2505.10798v1
- Date: Fri, 16 May 2025 02:24:32 GMT
- Title: Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets
- Authors: Erica Cai, Sean McQuade, Kevin Young, Brendan O'Connor,
- Abstract summary: AffilKG is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs.<n>Each dataset features affiliation graphs, which are simple KGs that capture Member relationships between Person and Organization entities.
- Score: 3.9244082434642555
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When knowledge graphs (KGs) are automatically extracted from text, are they accurate enough for downstream analysis? Unfortunately, current annotated datasets can not be used to evaluate this question, since their KGs are highly disconnected, too small, or overly complex. To address this gap, we introduce AffilKG (https://doi.org/10.5281/zenodo.15427977), which is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs. Each dataset features affiliation graphs, which are simple KGs that capture Member relationships between Person and Organization entities -- useful in studies of migration, community interactions, and other social phenomena. In addition, three datasets include expanded KGs with a wider variety of relation types. Our preliminary experiments demonstrate significant variability in model performance across datasets, underscoring AffilKG's ability to enable two critical advances: (1) benchmarking how extraction errors propagate to graph-level analyses (e.g., community structure), and (2) validating KG extraction methods for real-world social science research.
Related papers
- Understanding the Effect of Knowledge Graph Extraction Error on Downstream Graph Analyses: A Case Study on Affiliation Graphs [4.818309069556584]
Knowledge graphs (KGs) are useful for analyzing social structures, community dynamics, institutional memberships, and other complex relationships.<n>Recent advances in large language models (LLMs) have improved the scalability and accessibility of automated KG extraction from large text corpora.<n>The impacts of extraction errors on downstream analyses are poorly understood, especially for applied scientists who depend on accurate KGs for real-world insights.
arXiv Detail & Related papers (2025-06-14T06:14:06Z) - MuCo-KGC: Multi-Context-Aware Knowledge Graph Completion [0.0]
Multi-Context-Aware Knowledge Graph Completion (MuCo-KGC) is a novel model that utilizes contextual information from linked entities and relations within the graph to predict tail entities.<n>MuCo-KGC eliminates the need for entity descriptions and negative triplet sampling, significantly reducing computational complexity while enhancing performance.<n>Our experiments on standard datasets, including FB15k-237, WN18RR, CoDEx-S, and CoDEx-M, demonstrate that MuCo-KGC outperforms state-of-the-art methods on three datasets.
arXiv Detail & Related papers (2025-03-05T01:18:11Z) - Efficient Relational Context Perception for Knowledge Graph Completion [25.903926643251076]
Knowledge Graphs (KGs) provide a structured representation of knowledge but often suffer from challenges of incompleteness.<n>Previous knowledge graph embedding models are limited in their ability to capture expressive features.<n>We propose Triple Receptance Perception architecture to model sequential information, enabling the learning of dynamic context.
arXiv Detail & Related papers (2024-12-31T11:25:58Z) - iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models [0.7165255458140439]
iText2KG is a method for incremental, topic-independent Knowledge Graph construction without post-processing.
Our method demonstrates superior performance compared to baseline methods across three scenarios.
arXiv Detail & Related papers (2024-09-05T06:49:14Z) - GS-KGC: A Generative Subgraph-based Framework for Knowledge Graph Completion with Large Language Models [7.995716933782121]
We propose a novel completion framework called textbfGenerative textbfSubgraph-based KGC (GS-KGC)<n>This framework primarily includes a subgraph partitioning algorithm designed to generate negatives and neighbors.<n>Experiments conducted on four common KGC datasets highlight the advantages of the proposed GS-KGC.
arXiv Detail & Related papers (2024-08-20T13:13:41Z) - A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic,
and Multimodal [57.8455911689554]
Knowledge graph reasoning (KGR) aims to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs)
It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering, recommendation systems, and etc.
arXiv Detail & Related papers (2022-12-12T08:40:04Z) - KGxBoard: Explainable and Interactive Leaderboard for Evaluation of
Knowledge Graph Completion Models [76.01814380927507]
KGxBoard is an interactive framework for performing fine-grained evaluation on meaningful subsets of the data.
In our experiments, we highlight the findings with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.
arXiv Detail & Related papers (2022-08-23T15:11:45Z) - Explainable Sparse Knowledge Graph Completion via High-order Graph
Reasoning Network [111.67744771462873]
This paper proposes a novel explainable model for sparse Knowledge Graphs (KGs)
It combines high-order reasoning into a graph convolutional network, namely HoGRN.
It can not only improve the generalization ability to mitigate the information insufficiency issue but also provide interpretability.
arXiv Detail & Related papers (2022-07-14T10:16:56Z) - EventNarrative: A large-scale Event-centric Dataset for Knowledge
Graph-to-Text Generation [8.216976747904726]
EventNarrative consists of approximately 230,000 graphs and their corresponding natural language text, 6 times larger than the current largest parallel dataset.
Our aim is two-fold: help break new ground in event-centric research where data is lacking, and to give researchers a well-defined, large-scale dataset.
arXiv Detail & Related papers (2021-10-30T15:39:20Z) - Learning Intents behind Interactions with Knowledge Graph for
Recommendation [93.08709357435991]
Knowledge graph (KG) plays an increasingly important role in recommender systems.
Existing GNN-based models fail to identify user-item relation at a fine-grained level of intents.
We propose a new model, Knowledge Graph-based Intent Network (KGIN)
arXiv Detail & Related papers (2021-02-14T03:21:36Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.