GRainsaCK: a Comprehensive Software Library for Benchmarking Explanations of Link Prediction Tasks on Knowledge Graphs
- URL: http://arxiv.org/abs/2508.08815v1
- Date: Tue, 12 Aug 2025 10:15:58 GMT
- Title: GRainsaCK: a Comprehensive Software Library for Benchmarking Explanations of Link Prediction Tasks on Knowledge Graphs
- Authors: Roberto Barile, Claudia d'Amato, Nicola Fanizzi,
- Abstract summary: Explanation methods tackle this issue by identifying supporting knowledge explaining the predicted facts.<n>We propose GRainsaCK, a reusable software resource that fully streamlines all the tasks involved in benchmarking explanations.<n>GRainsaCK furthers modularity/extensibility by implementing the main components as functions that can be easily replaced.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since Knowledge Graphs are often incomplete, link prediction methods are adopted for predicting missing facts. Scalable embedding based solutions are mostly adopted for this purpose, however, they lack comprehensibility, which may be crucial in several domains. Explanation methods tackle this issue by identifying supporting knowledge explaining the predicted facts. Regretfully, evaluating/comparing quantitatively the resulting explanations is challenging as there is no standard evaluation protocol and overall benchmarking resource. We fill this important gap by proposing GRainsaCK, a reusable software resource that fully streamlines all the tasks involved in benchmarking explanations, i.e., from model training to evaluation of explanations along the same evaluation protocol. Moreover, GRainsaCK furthers modularity/extensibility by implementing the main components as functions that can be easily replaced. Finally, fostering its reuse, we provide extensive documentation including a tutorial.
Related papers
- COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations [89.37527535663433]
We present a large-scale dataset of 104k posts with user-provided notes and helpfulness labels.<n>We propose a framework that automatically generates and improves reason definitions via automatic prompt optimization.<n>Our experiments show that the optimized definitions can improve both helpfulness and reason prediction.
arXiv Detail & Related papers (2025-10-28T05:28:47Z) - What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge [24.7492528721447]
Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) is an increasingly explored approach for combining the reasoning capabilities of large language models with the structured evidence of knowledge graphs.<n>Existing benchmarks often include questions that can be directly answered using existing triples in KG.<n>In this work, we introduce a general method for constructing benchmarks, together with an evaluation protocol, to systematically assess KG-RAG methods under knowledge incompleteness.
arXiv Detail & Related papers (2025-08-11T10:55:06Z) - CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z) - Unifying Post-hoc Explanations of Knowledge Graph Completions [44.424583840470724]
Post-hoc explainability for Knowledge Graph Completion (KGC) lacks formalization and consistent evaluations.<n>This paper argues for a unified approach to post-hoc explainability in KGC.
arXiv Detail & Related papers (2025-07-29T13:31:48Z) - Multi-perspective Improvement of Knowledge Graph Completion with Large
Language Models [95.31941227776711]
We propose MPIKGC to compensate for the deficiency of contextualized knowledge and improve KGC by querying large language models (LLMs)
We conducted extensive evaluation of our framework based on four description-based KGC models and four datasets, for both link prediction and triplet classification tasks.
arXiv Detail & Related papers (2024-03-04T12:16:15Z) - KGA: A General Machine Unlearning Framework Based on Knowledge Gap
Alignment [51.15802100354848]
We propose a general unlearning framework called KGA to induce forgetfulness.
Experiments on large-scale datasets show that KGA yields comprehensive improvements over baselines.
arXiv Detail & Related papers (2023-05-11T02:44:29Z) - MQAG: Multiple-choice Question Answering and Generation for Assessing
Information Consistency in Summarization [55.60306377044225]
State-of-the-art summarization systems can generate highly fluent summaries.
These summaries, however, may contain factual inconsistencies and/or information not present in the source.
We introduce an alternative scheme based on standard information-theoretic measures in which the information present in the source and summary is directly compared.
arXiv Detail & Related papers (2023-01-28T23:08:25Z) - CARLA: A Python Library to Benchmark Algorithmic Recourse and
Counterfactual Explanation Algorithms [6.133522864509327]
CARLA (Counterfactual And Recourse LibrAry) is a python library for benchmarking counterfactual explanation methods.
We provide an extensive benchmark of 11 popular counterfactual explanation methods.
We also provide a benchmarking framework for research on future counterfactual explanation methods.
arXiv Detail & Related papers (2021-08-02T11:00:43Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z) - Structured Prediction with Partial Labelling through the Infimum Loss [85.4940853372503]
The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect.
This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one.
This paper provides a unified framework based on structured prediction and on the concept of infimum loss to deal with partial labelling.
arXiv Detail & Related papers (2020-03-02T13:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.