Sell Data to AI Algorithms Without Revealing It: Secure Data Valuation and Sharing via Homomorphic Encryption
- URL: http://arxiv.org/abs/2512.06033v1
- Date: Thu, 04 Dec 2025 16:35:09 GMT
- Title: Sell Data to AI Algorithms Without Revealing It: Secure Data Valuation and Sharing via Homomorphic Encryption
- Authors: Michael Yang, Ruijiang Gao, Zhiqiang, Zheng,
- Abstract summary: We introduce the Trustworthy Influence Protocol (TIP), a privacy-preserving framework that enables buyers to quantify the utility of external data without decrypting the raw assets.<n>By integrating Homomorphic Encryption with gradient-based influence functions, our approach allows for the precise, blinded scoring of data points against a buyer's specific AI model.<n> Empirical simulations in healthcare and generative AI domains validate the framework's economic potential.
- Score: 10.12846924939717
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The rapid expansion of Artificial Intelligence is hindered by a fundamental friction in data markets: the value-privacy dilemma, where buyers cannot verify a dataset's utility without inspection, yet inspection may expose the data (Arrow's Information Paradox). We resolve this challenge by introducing the Trustworthy Influence Protocol (TIP), a privacy-preserving framework that enables prospective buyers to quantify the utility of external data without ever decrypting the raw assets. By integrating Homomorphic Encryption with gradient-based influence functions, our approach allows for the precise, blinded scoring of data points against a buyer's specific AI model. To ensure scalability for Large Language Models (LLMs), we employ low-rank gradient projections that reduce computational overhead while maintaining near-perfect fidelity to plaintext baselines, as demonstrated across BERT and GPT-2 architectures. Empirical simulations in healthcare and generative AI domains validate the framework's economic potential: we show that encrypted valuation signals achieve a high correlation with realized clinical utility and reveal a heavy-tailed distribution of data value in pre-training corpora where a minority of texts drive capability while the majority degrades it. These findings challenge prevailing flat-rate compensation models and offer a scalable technical foundation for a meritocratic, secure data economy.
Related papers
- Learning More with Less: A Generalizable, Self-Supervised Framework for Privacy-Preserving Capacity Estimation with EV Charging Data [84.37348569981307]
We propose a first-of-its-kind capacity estimation model based on self-supervised pre-training.<n>Our model consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-10-05T08:58:35Z) - SynBench: A Benchmark for Differentially Private Text Generation [35.908455649647784]
Data-driven decision support in high-stakes domains like healthcare and finance faces significant barriers to data sharing.<n>Recent generative AI models, such as large language models, have shown impressive performance in open-domain tasks.<n>But their adoption in sensitive environments remains limited by unpredictable behaviors and insufficient privacy-preserving datasets.
arXiv Detail & Related papers (2025-09-18T03:57:50Z) - On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z) - Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI [6.671649946926508]
We present the first unified large-scale empirical study of privacy-fairness-utility trade-offs in Federated Learning (FL)<n>We compare fairness-awares with Differential Privacy (DP), Homomorphic Encryption (HE), and Secure Multi-Party Encryption (SMC)<n>We uncover unexpected interactions: DP mechanisms can negatively impact fairness, skew, and fairness-awares can inadvertently reduce privacy effectiveness.
arXiv Detail & Related papers (2025-03-20T15:31:01Z) - How Breakable Is Privacy: Probing and Resisting Model Inversion Attacks in Collaborative Inference [13.453033795109155]
Collaborative inference improves computational efficiency for edge devices by transmitting intermediate features to cloud models.<n>There is no established criterion for assessing the difficulty of model inversion attacks (MIAs)<n>We propose the first theoretical criterion to assess MIA difficulty in CI, identifying mutual information, entropy, and effective information volume as key influencing factors.
arXiv Detail & Related papers (2025-01-01T13:00:01Z) - Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing [74.58071278710896]
generative AI has attracted much attention from both academic and industrial fields.
Secure and privacy-preserving mobile crowdsensing (SPPMCS) has been widely applied in data collection/ acquirement.
arXiv Detail & Related papers (2024-05-17T04:00:58Z) - Provably Unlearnable Data Examples [27.24152626809928]
Efforts have been undertaken to render shared data unlearnable for unauthorized models in the wild.
We propose a mechanism for certifying the so-called $(q, eta)$-Learnability of an unlearnable dataset.
A lower certified $(q, eta)$-Learnability indicates a more robust and effective protection over the dataset.
arXiv Detail & Related papers (2024-05-06T09:48:47Z) - Reconciling AI Performance and Data Reconstruction Resilience for
Medical Imaging [52.578054703818125]
Artificial Intelligence (AI) models are vulnerable to information leakage of their training data, which can be highly sensitive.
Differential Privacy (DP) aims to circumvent these susceptibilities by setting a quantifiable privacy budget.
We show that using very large privacy budgets can render reconstruction attacks impossible, while drops in performance are negligible.
arXiv Detail & Related papers (2023-12-05T12:21:30Z) - RARE: Robust Masked Graph Autoencoder [45.485891794905946]
Masked graph autoencoder (MGAE) has emerged as a promising self-supervised graph pre-training (SGP) paradigm.
We propose a novel SGP method termed Robust mAsked gRaph autoEncoder (RARE) to improve the certainty in inferring masked data.
arXiv Detail & Related papers (2023-04-04T03:35:29Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.