Evaluating Knowledge Graph Complexity via Semantic, Spectral, and Structural Metrics for Link Prediction
- URL: http://arxiv.org/abs/2508.15291v1
- Date: Thu, 21 Aug 2025 06:27:20 GMT
- Title: Evaluating Knowledge Graph Complexity via Semantic, Spectral, and Structural Metrics for Link Prediction
- Authors: Haji Gul, Abul Ghani Naim, Ajaz Ahmad Bhat,
- Abstract summary: We introduce and benchmark a set of structural and semantic KG complexity metrics.<n>We find that CSG is highly sensitive to parametrisation and does not robustly scale with the number of classes.<n>Our results demonstrate that CSGs purported stability and generalization predictive power fail to hold in link prediction settings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding dataset complexity is fundamental to evaluating and comparing link prediction models on knowledge graphs (KGs). While the Cumulative Spectral Gradient (CSG) metric, derived from probabilistic divergence between classes within a spectral clustering framework, has been proposed as a classifier agnostic complexity metric purportedly scaling with class cardinality and correlating with downstream performance, it has not been evaluated in KG settings so far. In this work, we critically examine CSG in the context of multi relational link prediction, incorporating semantic representations via transformer derived embeddings. Contrary to prior claims, we find that CSG is highly sensitive to parametrisation and does not robustly scale with the number of classes. Moreover, it exhibits weak or inconsistent correlation with standard performance metrics such as Mean Reciprocal Rank (MRR) and Hit@1. To deepen the analysis, we introduce and benchmark a set of structural and semantic KG complexity metrics. Our findings reveal that global and local relational ambiguity captured via Relation Entropy, node level Maximum Relation Diversity, and Relation Type Cardinality exhibit strong inverse correlations with MRR and Hit@1, suggesting these as more faithful indicators of task difficulty. Conversely, graph connectivity measures such as Average Degree, Degree Entropy, PageRank, and Eigenvector Centrality correlate positively with Hit@10. Our results demonstrate that CSGs purported stability and generalization predictive power fail to hold in link prediction settings and underscore the need for more stable, interpretable, and task-aligned measures of dataset complexity in knowledge driven learning.
Related papers
- From Global to Granular: Revealing IQA Model Performance via Correlation Surface [83.65597122328133]
We present textbfGranularity-Modulated Correlation (GMC), which provides a structured, fine-grained analysis of IQA performance.<n>GMC includes a textbfDistribution Regulator that regularizes correlations to mitigate biases from non-uniform quality distributions.<n>Experiments on standard benchmarks show that GMC reveals performance characteristics invisible to scalar metrics, offering a more informative and reliable paradigm for analyzing, comparing, and deploying IQA models.
arXiv Detail & Related papers (2026-01-29T13:55:26Z) - Correlation-Aware Feature Attribution Based Explainable AI [4.457502798302293]
emphExCIR is a correlation-aware attribution score equipped with a lightweight transfer protocol.<n>textscBlockCIR mitigates double-counting in collinear clusters.<n>emphscalable explainability provides emphcomputationally efficient, emphconsistent, and emphscalable explainability for real-world deployment.
arXiv Detail & Related papers (2025-11-20T15:51:00Z) - From Kernels to Attention: A Transformer Framework for Density and Score Estimation [0.47745223151611654]
We introduce a unified attention-based framework for joint score and density estimation.<n>We develop a permutation- and affine-equivariant transformer that estimates both the probability density $f(x)$ and its score $nabla_x log f(x)$ directly from i.i.d. samples.
arXiv Detail & Related papers (2025-11-08T08:38:37Z) - Evaluating Cumulative Spectral Gradient as a Complexity Measure [0.0]
Cumulative Spectral Gradient (CSG) was proposed as a dataset complexity measure.<n>In this work, we rigorously assess CSG behavior on standard knowledge graph link prediction benchmarks.
arXiv Detail & Related papers (2025-09-02T15:10:25Z) - Structural Connectome Harmonization Using Deep Learning: The Strength of Graph Neural Networks [0.9663199711697325]
Small sample sizes in structural connectome (SC) studies limit the development of reliable biomarkers for neurological and psychiatric disorders.<n>Large-scale multi-site studies have exist, but they have acquisition-related biases due to scanner heterogeneity.<n>We propose a site-conditioned deep harmonization framework that harmonizes SCs across diverse acquisition sites without requiring metadata.
arXiv Detail & Related papers (2025-07-18T14:58:05Z) - Information-theoretic Quantification of High-order Feature Effects in Classification Problems [0.19791587637442676]
We present an information-theoretic extension of the High-order interactions for Feature importance (Hi-Fi) method.<n>Our framework decomposes feature contributions into unique, synergistic, and redundant components.<n>Results indicate that the proposed estimator accurately recovers theoretical and expected findings.
arXiv Detail & Related papers (2025-07-06T11:50:30Z) - Hallucination Detection in LLMs with Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.<n>We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z) - Spuriousness-Aware Meta-Learning for Learning Robust Classifiers [26.544938760265136]
Spurious correlations are brittle associations between certain attributes of inputs and target variables.
Deep image classifiers often leverage them for predictions, leading to poor generalization on the data where the correlations do not hold.
Mitigating the impact of spurious correlations is crucial towards robust model generalization, but it often requires annotations of the spurious correlations in data.
arXiv Detail & Related papers (2024-06-15T21:41:25Z) - Federated Contrastive Learning for Personalized Semantic Communication [55.46383524190467]
We design a federated contrastive learning framework aimed at supporting personalized semantic communication.
FedCL enables collaborative training of local semantic encoders across multiple clients and a global semantic decoder owned by the base station.
To tackle the semantic imbalance issue arising from heterogeneous datasets across distributed clients, we employ contrastive learning to train a semantic centroid generator.
arXiv Detail & Related papers (2024-06-13T14:45:35Z) - Variable Importance in High-Dimensional Settings Requires Grouping [19.095605415846187]
Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases.
Grouping variables statistically via clustering or some prior knowledge gains some power back.
We show that the approach extended with stacking controls the type-I error even with highly-correlated groups.
arXiv Detail & Related papers (2023-12-18T00:21:47Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Simple and Scalable Algorithms for Cluster-Aware Precision Medicine [0.0]
We propose a simple and scalable approach to joint clustering and embedding.
This novel, cluster-aware embedding approach overcomes the complexity and limitations of current joint embedding and clustering methods.
Our approach does not require the user to choose the desired number of clusters, but instead yields interpretable dendrograms of hierarchically clustered embeddings.
arXiv Detail & Related papers (2022-11-29T19:27:26Z) - ER: Equivariance Regularizer for Knowledge Graph Completion [107.51609402963072]
We propose a new regularizer, namely, Equivariance Regularizer (ER)
ER can enhance the generalization ability of the model by employing the semantic equivariance between the head and tail entities.
The experimental results indicate a clear and substantial improvement over the state-of-the-art relation prediction methods.
arXiv Detail & Related papers (2022-06-24T08:18:05Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.