Compliance as a Trust Metric
- URL: http://arxiv.org/abs/2601.01287v1
- Date: Sat, 03 Jan 2026 21:14:40 GMT
- Title: Compliance as a Trust Metric
- Authors: Wenbo Wu, George Konstantinidis,
- Abstract summary: This paper bridges this research gap by operationalizing regulatory compliance as a quantitative and dynamic trust metric.<n>Our contribution is a quantitative model that assesses the severity of each violation along multiple dimensions, including its Volume, Duration, Breadth, and Criticality.<n>We evaluate ACE on a synthetic hospital dataset, demonstrating its ability to accurately detect a range of complex HIPAA and HIPAA violations.
- Score: 1.0264137858888513
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Trust and Reputation Management Systems (TRMSs) are critical for the modern web, yet their reliance on subjective user ratings or narrow Quality of Service (QoS) metrics lacks objective grounding. Concurrently, while regulatory frameworks like GDPR and HIPAA provide objective behavioral standards, automated compliance auditing has been limited to coarse, binary (pass/fail) outcomes. This paper bridges this research gap by operationalizing regulatory compliance as a quantitative and dynamic trust metric through our novel automated compliance engine (ACE). ACE first formalizes legal and organizational policies into a verifiable, obligation-centric logic. It then continuously audits system event logs against this logic to detect violations. The core of our contribution is a quantitative model that assesses the severity of each violation along multiple dimensions, including its Volume, Duration, Breadth, and Criticality, to compute a fine-grained, evolving compliance score. We evaluate ACE on a synthetic hospital dataset, demonstrating its ability to accurately detect a range of complex HIPAA and GDPR violations and produce a nuanced score that is significantly more expressive than traditional binary approaches. This work enables the development of more transparent, accountable, and resilient TRMSs on the Web.
Related papers
- The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI [0.0]
This paper introduces a novel auditing framework to quantify latent trait estimation under ordinal uncertainty.<n>The research audits nine leading models across dimensions including Optimization Bias, Sycophancy, and Status-Quo Legitimization.
arXiv Detail & Related papers (2026-02-19T06:56:01Z) - TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research Agents [51.30998248590416]
Trajectory-Aware Comprehensive Evaluation (TRACE) is a framework that holistically assesses the entire problem-solving trajectory.<n>Our contributions include the TRACE framework, its novel metrics, and the accompanying DeepResearch-Bench with controllable complexity.
arXiv Detail & Related papers (2026-02-05T13:28:57Z) - Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation [76.5533899503582]
Large language models (LLMs) are increasingly used as judges to evaluate agent performance.<n>We show this paradigm implicitly assumes that the agent's chain-of-thought (CoT) reasoning faithfully reflects both its internal reasoning and the underlying environment state.<n>We demonstrate that manipulated reasoning alone can inflate false positive rates of state-of-the-art VLM judges by up to 90% across 800 trajectories spanning diverse web tasks.
arXiv Detail & Related papers (2026-01-21T06:07:43Z) - Variance-Bounded Evaluation of Entity-Centric AI Systems Without Ground Truth: Theory and Measurement [0.0]
We introduce VB-Score, a variance-bounded evaluation framework for entity-centric AI systems.<n> VB-Score enumerates plausible interpretations through constraint relaxation and Monte Carlo sampling.<n>It then evaluates system outputs by their expected success across interpretations, penalized by variance to assess robustness of the system.
arXiv Detail & Related papers (2025-09-26T07:54:38Z) - LLM-as-a-Judge: Rapid Evaluation of Legal Document Recommendation for Retrieval-Augmented Generation [40.06592175227558]
This paper investigates a principled approach to evaluating Retrieval-Augmented Generation systems in legal contexts.<n>We find that traditional agreement metrics like Krippendorff's alpha can be misleading in the skewed distributions typical of AI system evaluations.<n>Our findings suggest a path toward scalable, cost-effective evaluation that maintains the precision demanded by legal applications.
arXiv Detail & Related papers (2025-09-15T19:20:21Z) - CCE: Confidence-Consistency Evaluation for Time Series Anomaly Detection [56.302586730134806]
We introduce Confidence-Consistency Evaluation (CCE), a novel evaluation metric.<n>CCE simultaneously measures prediction confidence and uncertainty consistency.<n>We also establish RankEval, a benchmark for comparing the ranking capabilities of various metrics.
arXiv Detail & Related papers (2025-09-01T03:38:38Z) - From Reports to Reality: Testing Consistency in Instagram's Digital Services Act Compliance Data [0.0]
The Digital Services Act (DSA) introduces rules for content moderation and platform governance in the European Union.<n>This study examined compliance with DSA requirements, focusing on Instagram.<n>We develop and apply a multi-level consistency framework to evaluate DSA compliance.
arXiv Detail & Related papers (2025-07-02T15:13:25Z) - Towards a HIPAA Compliant Agentic AI System in Healthcare [3.6185342807265415]
This paper introduces a HIPAA-compliant Agentic AI framework that enforces regulatory compliance through dynamic, context-aware policy enforcement.<n>Our framework integrates three core mechanisms: (1) Attribute-Based Access Control (ABAC) for granular governance, (2) a hybrid PHI sanitization pipeline combining patterns and BERT-based model to minimize leakage, and (3) immutable audit trails for compliance verification.
arXiv Detail & Related papers (2025-04-24T15:38:20Z) - Rethinking Robustness in Machine Learning: A Posterior Agreement Approach [41.50777631705435]
Posterior Agreement (PA) theory of model validation provides a principled framework for robustness evaluation.<n>We show that PA offers a reliable analysis of the vulnerabilities in learning algorithms across different shift conditions.<n>Results show that PA provides higher discriminability than accuracy-based measures, while requiring no supervision.
arXiv Detail & Related papers (2025-03-20T16:03:39Z) - Bridging the Gap in XAI-Why Reliable Metrics Matter for Explainability and Compliance [2.3020018305241337]
The paper emphasizes the critical gap in the evaluation of Explainable AI (XAI) due to the lack of standardized and reliable metrics.<n>Current evaluation methods are often fragmented, subjective, and biased, making them prone to manipulation and complicating the assessment of complex models.<n>We advocate for widespread research into developing robust, context-sensitive evaluation metrics.
arXiv Detail & Related papers (2025-02-07T06:54:48Z) - Certifiably Byzantine-Robust Federated Conformal Prediction [49.23374238798428]
We introduce a novel framework Rob-FCP, which executes robust federated conformal prediction effectively countering malicious clients.
We empirically demonstrate the robustness of Rob-FCP against diverse proportions of malicious clients under a variety of Byzantine attacks.
arXiv Detail & Related papers (2024-06-04T04:43:30Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - GO FIGURE: A Meta Evaluation of Factuality in Summarization [131.1087461486504]
We introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics.
Our benchmark analysis on ten factuality metrics reveals that our framework provides a robust and efficient evaluation.
It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.
arXiv Detail & Related papers (2020-10-24T08:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.