Related papers: Continuous Monitoring of Large-Scale Generative AI via Deterministic Knowledge Graph Structures

Continuous Monitoring of Large-Scale Generative AI via Deterministic Knowledge Graph Structures

URL: http://arxiv.org/abs/2509.03857v1
Date: Thu, 04 Sep 2025 03:34:49 GMT
Title: Continuous Monitoring of Large-Scale Generative AI via Deterministic Knowledge Graph Structures
Authors: Kishor Datta Gupta, Mohd Ariful Haque, Hasmot Ali, Marufa Kamal, Syed Bahauddin Alam, Mohammad Ashiqur Rahman,
Abstract summary: This research proposes a systematic using deterministic and Large Language Model (LLM)-generated Knowledge Graphs (KGs) to monitor AI reliability.<n>We construct two KGs: (i) a deterministic KG built using explicit rule-based methods, dictionaries, structured entity-relation extraction rules, and (ii) an LLM-generated KG dynamically derived from real-time data streams such as live news articles.<n>To quantify hallucinations and semantic discrepancies, we employ several established KG metrics, including Instantiated Class Ratio (ICR), Instantiated Property Ratio (IPR), and Class Instantiation (CI)
Score: 2.7277205894982095
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative AI (GEN AI) models have revolutionized diverse application domains but present substantial challenges due to reliability concerns, including hallucinations, semantic drift, and inherent biases. These models typically operate as black-boxes, complicating transparent and objective evaluation. Current evaluation methods primarily depend on subjective human assessment, limiting scalability, transparency, and effectiveness. This research proposes a systematic methodology using deterministic and Large Language Model (LLM)-generated Knowledge Graphs (KGs) to continuously monitor and evaluate GEN AI reliability. We construct two parallel KGs: (i) a deterministic KG built using explicit rule-based methods, predefined ontologies, domain-specific dictionaries, and structured entity-relation extraction rules, and (ii) an LLM-generated KG dynamically derived from real-time textual data streams such as live news articles. Utilizing real-time news streams ensures authenticity, mitigates biases from repetitive training, and prevents adaptive LLMs from bypassing predefined benchmarks through feedback memorization. To quantify structural deviations and semantic discrepancies, we employ several established KG metrics, including Instantiated Class Ratio (ICR), Instantiated Property Ratio (IPR), and Class Instantiation (CI). An automated real-time monitoring framework continuously computes deviations between deterministic and LLM-generated KGs. By establishing dynamic anomaly thresholds based on historical structural metric distributions, our method proactively identifies and flags significant deviations, thus promptly detecting semantic anomalies or hallucinations. This structured, metric-driven comparison between deterministic and dynamically generated KGs delivers a robust and scalable evaluation framework.

Related papers

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z)
The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI [0.0]
This paper introduces a novel auditing framework to quantify latent trait estimation under ordinal uncertainty.<n>The research audits nine leading models across dimensions including Optimization Bias, Sycophancy, and Status-Quo Legitimization.
arXiv Detail & Related papers (2026-02-19T06:56:01Z)
Volatility in Certainty (VC): A Metric for Detecting Adversarial Perturbations During Inference in Neural Network Classifiers [0.5793804025420254]
Adversarial robustness remains a critical challenge in deploying neural network classifiers.<n>This paper investigates textitVolatility in Certainty (VC), a label-free metric that quantifies irregularities in model confidence.
arXiv Detail & Related papers (2025-11-14T19:51:04Z)
Generative Modeling and Decision Fusion for Unknown Event Detection and Classification Using Synchrophasor Data [9.871276314615447]
This paper proposes a novel framework that integrates generative modeling, sliding-window temporal processing, and decision fusion to achieve robust event detection and classification.<n> Experimental results demonstrate state-of-the-art accuracy, surpassing machine learning, deep learning, and envelope-based baselines.
arXiv Detail & Related papers (2025-09-26T18:04:03Z)
Commuting Distance Regularization for Timescale-Dependent Label Inconsistency in EEG Emotion Recognition [1.4499463058550683]
We address the often-overlooked issue of Timescale Dependent Label Inconsistency (TsDLI) in training neural network models for EEG-based human emotion recognition.<n>We propose two novel regularization strategies: Local Variation Loss (LVL) and Local-Global Consistency Loss (LGCL)<n>Results consistently show that our proposed methods outperform state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-15T01:22:14Z)
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts [0.6291443816903801]
This paper introduces a novel framework designed to autonomously evaluate the robustness of large language models (LLMs)<n>Our method generates descriptive sentences from domain-constrained knowledge graph triplets to formulate adversarial prompts.<n>This self-evaluation mechanism allows the LLM to evaluate its robustness without the need for external benchmarks.
arXiv Detail & Related papers (2024-12-01T10:58:53Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs) We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence. We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Development of Interpretable Machine Learning Models to Detect Arrhythmia based on ECG Data [0.0]
This thesis builds Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) classifiers based on state-of-the-art models. Both global and local interpretability methods are exploited to understand the interaction between dependent and independent variables. It was found that Grad-Cam was the most effective interpretability technique at explaining predictions of proposed CNN and LSTM models.
arXiv Detail & Related papers (2022-05-05T17:29:33Z)
A Priori Denoising Strategies for Sparse Identification of Nonlinear Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements. We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z)
Formal Verification of Unknown Dynamical Systems via Gaussian Process Regression [11.729744197698718]
Leveraging autonomous systems in safety-critical scenarios requires verifying their behaviors in the presence of uncertainties. We develop a framework for verifying discrete-time dynamical systems with unmodelled dynamics and noisy measurements.
arXiv Detail & Related papers (2021-12-31T05:10:05Z)
Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data. We formalize the relevant causal structure of problems such as dynamic personalized pricing. We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.