Understanding the Effect of Knowledge Graph Extraction Error on Downstream Graph Analyses: A Case Study on Affiliation Graphs
- URL: http://arxiv.org/abs/2506.12367v1
- Date: Sat, 14 Jun 2025 06:14:06 GMT
- Title: Understanding the Effect of Knowledge Graph Extraction Error on Downstream Graph Analyses: A Case Study on Affiliation Graphs
- Authors: Erica Cai, Brendan O'Connor,
- Abstract summary: Knowledge graphs (KGs) are useful for analyzing social structures, community dynamics, institutional memberships, and other complex relationships.<n>Recent advances in large language models (LLMs) have improved the scalability and accessibility of automated KG extraction from large text corpora.<n>The impacts of extraction errors on downstream analyses are poorly understood, especially for applied scientists who depend on accurate KGs for real-world insights.
- Score: 4.818309069556584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge graphs (KGs) are useful for analyzing social structures, community dynamics, institutional memberships, and other complex relationships across domains from sociology to public health. While recent advances in large language models (LLMs) have improved the scalability and accessibility of automated KG extraction from large text corpora, the impacts of extraction errors on downstream analyses are poorly understood, especially for applied scientists who depend on accurate KGs for real-world insights. To address this gap, we conducted the first evaluation of KG extraction performance at two levels: (1) micro-level edge accuracy, which is consistent with standard NLP evaluations, and manual identification of common error sources; (2) macro-level graph metrics that assess structural properties such as community detection and connectivity, which are relevant to real-world applications. Focusing on affiliation graphs of person membership in organizations extracted from social register books, our study identifies a range of extraction performance where biases across most downstream graph analysis metrics are near zero. However, as extraction performance declines, we find that many metrics exhibit increasingly pronounced biases, with each metric tending toward a consistent direction of either over- or under-estimation. Through simulations, we further show that error models commonly used in the literature do not capture these bias patterns, indicating the need for more realistic error models for KG extraction. Our findings provide actionable insights for practitioners and underscores the importance of advancing extraction methods and error modeling to ensure reliable and meaningful downstream analyses.
Related papers
- Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets [3.9244082434642555]
AffilKG is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs.<n>Each dataset features affiliation graphs, which are simple KGs that capture Member relationships between Person and Organization entities.
arXiv Detail & Related papers (2025-05-16T02:24:32Z) - ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training [38.3788247358102]
This paper presents an anomaly detection algorithm in knowledge graphs with dual-channel learning (ADKGD)<n>We introduce a kullback-leibler (KL)-loss component to improve the accuracy of the scoring function between the dual channels.<n> Experimental results demonstrate that ADKGD outperforms the state-of-the-art anomaly detection algorithms.
arXiv Detail & Related papers (2025-01-13T06:22:52Z) - KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data [13.510494408303536]
KnowGraph integrates domain knowledge with data-driven learning for enhanced graph-based anomaly detection.
Tests show KnowGraph consistently outperforms state-of-the-art baselines in both transductive and inductive settings.
Results highlight the potential of integrating domain knowledge into data-driven models for high-stakes, graph-based security applications.
arXiv Detail & Related papers (2024-10-10T21:53:33Z) - Mitigating Relational Bias on Knowledge Graphs [51.346018842327865]
We propose Fair-KGNN, a framework that simultaneously alleviates multi-hop bias and preserves the proximity information of entity-to-relation in knowledge graphs.
We develop two instances of Fair-KGNN incorporating with two state-of-the-art KGNN models, RGCN and CompGCN, to mitigate gender-occupation and nationality-salary bias.
arXiv Detail & Related papers (2022-11-26T05:55:34Z) - A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search.
We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z) - How Knowledge Graph and Attention Help? A Quantitative Analysis into
Bag-level Relation Extraction [66.09605613944201]
We quantitatively evaluate the effect of attention and Knowledge Graph on bag-level relation extraction (RE)
We find that (1) higher attention accuracy may lead to worse performance as it may harm the model's ability to extract entity mention features; (2) the performance of attention is largely influenced by various noise distribution patterns; and (3) KG-enhanced attention indeed improves RE performance, while not through enhanced attention but by incorporating entity prior.
arXiv Detail & Related papers (2021-07-26T09:38:28Z) - Model-Agnostic Graph Regularization for Few-Shot Learning [60.64531995451357]
We present a comprehensive study on graph embedded few-shot learning.
We introduce a graph regularization approach that allows a deeper understanding of the impact of incorporating graph information between labels.
Our approach improves the performance of strong base learners by up to 2% on Mini-ImageNet and 6.7% on ImageNet-FS.
arXiv Detail & Related papers (2021-02-14T05:28:13Z) - RelWalk A Latent Variable Model Approach to Knowledge Graph Embedding [50.010601631982425]
This paper extends the random walk model (Arora et al., 2016a) of word embeddings to Knowledge Graph Embeddings (KGEs)
We derive a scoring function that evaluates the strength of a relation R between two entities h (head) and t (tail)
We propose a learning objective motivated by the theoretical analysis to learn KGEs from a given knowledge graph.
arXiv Detail & Related papers (2021-01-25T13:31:29Z) - Graph Representation Learning via Graphical Mutual Information
Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations.
We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.