Smart Data Portfolios: A Quantitative Framework for Input Governance in AI
- URL: http://arxiv.org/abs/2512.16452v1
- Date: Thu, 18 Dec 2025 12:15:27 GMT
- Title: Smart Data Portfolios: A Quantitative Framework for Input Governance in AI
- Authors: A. Talha Yalta, A. Yasemin Yalta,
- Abstract summary: Growing concerns about fairness, privacy, robustness, and transparency have made it a central expectation of AI governance.<n>We introduce the Smart Data Portfolio framework, which treats data categories as productive but risk-bearing assets.<n>Regulators shape this frontier through risk caps, admissible categories, and weight bands.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Growing concerns about fairness, privacy, robustness, and transparency have made it a central expectation of AI governance that automated decisions be explainable by institutions and intelligible to affected parties. We introduce the Smart Data Portfolio (SDP) framework, which treats data categories as productive but risk-bearing assets, formalizing input governance as an information-risk trade-off. Within this framework, we define two portfolio-level quantities, Informational Return and Governance-Adjusted Risk, whose interaction characterizes data mixtures and generates a Governance-Efficient Frontier. Regulators shape this frontier through risk caps, admissible categories, and weight bands that translate fairness, privacy, robustness, and provenance requirements into measurable constraints on data allocation while preserving model flexibility. A telecommunications illustration shows how different AI services require distinct portfolios within a common governance structure. The framework offers a familiar portfolio logic as an input-level explanation layer suited to the large-scale deployment of AI systems.
Related papers
- An Agentic Software Framework for Data Governance under DPDP [0.0]
In India, the Digital Personal Data Protection Act mandates rigorous data privacy and compliance requirements.<n>From a software development perspective, traditional compliance tools often rely on hard-coded rules and static configurations.<n>This paper introduces a novel agentic framework to embed compliance logic directly into software agents that govern and adapt data policies.
arXiv Detail & Related papers (2026-01-03T07:46:43Z) - SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling [50.66950115630554]
Retrieval-Augmented Generation (RAG) and its Multimodal Retrieval-Augmented Generation (MRAG) significantly improve the knowledge coverage and contextual understanding of Large Language Models (LLMs)<n>However, retrieval and multimodal fusion obscure content provenance, rendering existing membership inference methods unable to reliably attribute generated outputs to pre-training, external retrieval, or user input, thus undermining privacy leakage accountability.<n>We propose the first Source-aware Membership Audit (SMA) that enables fine-grained source attribution of generated content in a semi-black-box setting with retrieval control capabilities.
arXiv Detail & Related papers (2025-08-12T17:32:24Z) - Rethinking Data Protection in the (Generative) Artificial Intelligence Era [138.07763415496288]
We propose a four-level taxonomy that captures the diverse protection needs arising in modern (generative) AI models and systems.<n>Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline.
arXiv Detail & Related papers (2025-07-03T02:45:51Z) - DP-RTFL: Differentially Private Resilient Temporal Federated Learning for Trustworthy AI in Regulated Industries [0.0]
This paper introduces Differentially Private Resilient Temporal Federated Learning (DP-RTFL)<n>It is designed to ensure training continuity, precise state recovery, and strong data privacy.<n>The framework is particularly suited for critical applications like credit risk assessment using sensitive financial data.
arXiv Detail & Related papers (2025-05-27T16:30:25Z) - Policy Frameworks for Transparent Chain-of-Thought Reasoning in Large Language Models [1.0088912103548195]
Chain-of-Thought (CoT) reasoning enhances large language models (LLMs) by decomposing complex problems into step-by-step solutions.<n>Current CoT disclosure policies vary widely across different models in visibility, API access, and pricing strategies, lacking a unified policy framework.<n>We propose a tiered-access policy framework that balances transparency, accountability, and security by tailoring CoT availability to academic, business, and general users.
arXiv Detail & Related papers (2025-03-14T19:54:18Z) - Balancing Confidentiality and Transparency for Blockchain-based Process-Aware Information Systems [43.253676241213626]
We propose an architecture for blockchain-based PAISs to preserve confidentiality and transparency.<n>Smart contracts enact, enforce and store public interactions, while attribute-based encryption techniques are adopted to specify access grants to confidential information.<n>We assess the security of our solution through a systematic threat model analysis and evaluate its practical feasibility.
arXiv Detail & Related papers (2024-12-07T20:18:36Z) - Trustworthy AI: Securing Sensitive Data in Large Language Models [0.0]
Large Language Models (LLMs) have transformed natural language processing (NLP) by enabling robust text generation and understanding.
This paper proposes a comprehensive framework for embedding trust mechanisms into LLMs to dynamically control the disclosure of sensitive information.
arXiv Detail & Related papers (2024-09-26T19:02:33Z) - Coordinated Flaw Disclosure for AI: Beyond Security Vulnerabilities [1.3225694028747144]
We propose a Coordinated Flaw Disclosure framework tailored to the complexities of machine learning (ML) issues.
Our framework introduces innovations such as extended model cards, dynamic scope expansion, an independent adjudication panel, and an automated verification process.
We argue that CFD could significantly enhance public trust in AI systems.
arXiv Detail & Related papers (2024-02-10T20:39:04Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.