xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models
- URL: http://arxiv.org/abs/2507.08432v1
- Date: Fri, 11 Jul 2025 09:18:41 GMT
- Title: xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models
- Authors: Gustavo Correa Publio, José Emilio Labra Gayo,
- Abstract summary: Shapes Constraint Language (SHACL) is a powerful language for validating RDF data.<n>This paper presents XPSHACL, an explainable SHACL validation system.<n>It combines rule-based justification trees with retrieval-augmented generation (RAG) and large language models (LLMs) to produce detailed, multilanguage explanations for constraint violations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Shapes Constraint Language (SHACL) is a powerful language for validating RDF data. Given the recent industry attention to Knowledge Graphs (KGs), more users need to validate linked data properly. However, traditional SHACL validation engines often provide terse reports in English that are difficult for non-technical users to interpret and act upon. This paper presents xpSHACL, an explainable SHACL validation system that addresses this issue by combining rule-based justification trees with retrieval-augmented generation (RAG) and large language models (LLMs) to produce detailed, multilanguage, human-readable explanations for constraint violations. A key feature of xpSHACL is its usage of a Violation KG to cache and reuse explanations, improving efficiency and consistency.
Related papers
- SHACL Validation under Graph Updates (Extended Paper) [6.755812289103844]
We present a SHACL-based update language that can capture intuitive and realistic modifications on RDF graphs.<n>This problem asks to verify whether every graph that validates a SHACL specification will still do so after applying a given update sequence.<n>We show that static validation can be reduced to (un)satisfiability of constraints in (a minor extension) SHACL.
arXiv Detail & Related papers (2025-07-31T19:58:16Z) - Towards Operationalizing Right to Data Protection [8.61230665736263]
RegText is a framework that injects imperceptible correlations into natural language datasets effectively rendering them unlearnable without affecting content.
We demonstrate RegText's utility through rigorous empirical analysis of small and large LMs.
RegText can newer models like GPT-4o and Llama from learning on our generated data, resulting in a drop in their test accuracy compared to their zero-shot performance.
arXiv Detail & Related papers (2024-11-13T10:43:31Z) - Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs [61.796960984541464]
We present COM2 (COMplex COMmonsense), a new dataset created by sampling logical queries.
We verbalize them using handcrafted rules and large language models into multiple-choice and text generation questions.
Experiments show that language models trained on COM2 exhibit significant improvements in complex reasoning ability.
arXiv Detail & Related papers (2024-03-12T08:13:52Z) - ConstraintChecker: A Plugin for Large Language Models to Reason on
Commonsense Knowledge Bases [53.29427395419317]
Reasoning over Commonsense Knowledge Bases (CSKB) has been explored as a way to acquire new commonsense knowledge.
We propose **ConstraintChecker**, a plugin over prompting techniques to provide and check explicit constraints.
arXiv Detail & Related papers (2024-01-25T08:03:38Z) - ChatRule: Mining Logical Rules with Large Language Models for Knowledge
Graph Reasoning [107.61997887260056]
We propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs.
Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs.
To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs.
arXiv Detail & Related papers (2023-09-04T11:38:02Z) - Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task.
The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.
Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z) - Error-Robust Retrieval for Chinese Spelling Check [43.56073620728942]
Chinese Spelling Check (CSC) aims to detect and correct error tokens in Chinese contexts.
Previous methods may not fully leverage the existing datasets.
We introduce our plug-and-play retrieval method with error-robust information for Chinese Spelling Check.
arXiv Detail & Related papers (2022-11-15T01:55:34Z) - CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation [91.16551253297588]
COunterfactual Generation via Retrieval and Editing (CORE) is a retrieval-augmented generation framework for creating diverse counterfactual perturbations for training.
CORE first performs a dense retrieval over a task-related unlabeled text corpus using a learned bi-encoder.
CORE then incorporates these into prompts to a large language model with few-shot learning capabilities, for counterfactual editing.
arXiv Detail & Related papers (2022-10-10T17:45:38Z) - A Review of SHACL: From Data Validation to Schema Reasoning for RDF
Graphs [3.274290296343038]
We present an introduction and a review of the Shapes Constraint Language (SHACL), the W3C recommendation language for validating RDF data.
A SHACL document describes a set of constraints on RDF nodes, and a graph is valid with respect to the document if its nodes satisfy these constraints.
arXiv Detail & Related papers (2021-12-02T17:28:45Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z) - SHACL Satisfiability and Containment (Extended Paper) [6.308539010172308]
The Shapes Constraint Language (SHACL) is a recent W3C recommendation language for validating RDF data.
In this paper, we undertake a thorough study of different features of non-recursive SHACL by providing a translation to a new first-order language, called SCL.
We study the interaction of SHACL features in this logic and provide the detailed map of decidability and complexity results of the aforementioned decision problems for different SHACL sublanguages.
arXiv Detail & Related papers (2020-08-31T14:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.