Related papers: Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization

Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization

URL: http://arxiv.org/abs/2404.16241v1
Date: Wed, 24 Apr 2024 22:58:42 GMT
Title: Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization
Authors: Zahir Alsulaimawi,
Abstract summary: We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction and an Expectation Maximization (EM) approach optimized for structured data privacy. Our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy. The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types.
Score: 2.28438857884398
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study develops a novel framework for privacy-preserving data analytics, addressing the critical challenge of balancing data utility with privacy concerns. We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction while masking sensitive attributes and an Expectation Maximization (EM) approach optimized for structured data privacy. Applied to datasets such as Modified MNIST and CelebrityA, our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy. Our experimental results confirm that these approaches achieve superior privacy protection and retain high utility, making them viable for practical applications where both aspects are crucial. The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types and establishing new benchmarks for utility and confidentiality in data analytics.

Related papers

Privacy-Utility Trade-off in Data Publication: A Bilevel Optimization Framework with Curvature-Guided Perturbation [22.727580097886747]
We introduce a novel bilevel optimization framework for the publication of private datasets.<n>In the upper-level task, a discriminator guides the generation process to ensure that latent variables are mapped to high-quality samples.<n>In the lower-level task, our framework employs local extrinsic curvature on the data manifold as a quantitative measure of individual vulnerability to MIA.
arXiv Detail & Related papers (2025-09-02T07:44:21Z)
Improving Noise Efficiency in Privacy-preserving Dataset Distillation [59.57846442477106]
We introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality.<n>On CIFAR-10, our method achieves a textbf10.0% improvement with 50 images per class and textbf8.3% increase with just textbfone-fifth the distilled set size of previous state-of-the-art methods.
arXiv Detail & Related papers (2025-08-03T13:15:52Z)
Optimal Allocation of Privacy Budget on Hierarchical Data Release [48.96399034594329]
This paper addresses the problem of optimal privacy budget allocation for hierarchical data release.<n>It aims to maximize data utility subject to a total privacy budget while considering the inherent trade-offs between data granularity and privacy loss.
arXiv Detail & Related papers (2025-05-16T05:25:11Z)
Optimizing the Privacy-Utility Balance using Synthetic Data and Configurable Perturbation Pipelines [0.0]
This paper explores the strategic use of modern synthetic data generation and advanced data perturbation techniques to enhance security, maintain analytical utility, and improve operational efficiency when managing large datasets. The goal is to create realistic, privacy-preserving datasets that retain high utility for complex machine learning tasks and analytics, a critical need in the data-sensitive industries like BFSI, Healthcare, Retail, and Telecommunications.
arXiv Detail & Related papers (2025-04-24T15:52:53Z)
Adaptive Clipping for Privacy-Preserving Few-Shot Learning: Enhancing Generalization with Limited Data [12.614480013684759]
We introduce a novel approach called Meta-Clip to enhance the utility of privacy-preserving few-shot learning methods. By dynamically adjusting clipping thresholds during the training process, our Adaptive Clipping method provides fine-grained control over the disclosure of sensitive information. We demonstrate the effectiveness of our approach in minimizing utility degradation, showcasing a superior privacy-preserving trade-off compared to existing privacy-preserving techniques.
arXiv Detail & Related papers (2025-03-27T05:14:18Z)
Multi-Objective Optimization-Based Anonymization of Structured Data for Machine Learning [0.5452584641316627]
Our research identifies key limitations in existing optimization models for privacy preservation. We propose a novel multi-objective optimization model that simultaneously minimizes information loss and maximizes protection against attacks.
arXiv Detail & Related papers (2025-01-02T01:52:36Z)
SafeSynthDP: Leveraging Large Language Models for Privacy-Preserving Synthetic Data Generation Using Differential Privacy [0.0]
We investigate capability of Large Language Models (Ms) to generate synthetic datasets with Differential Privacy (DP) mechanisms.<n>Our approach incorporates DP-based noise injection methods, including Laplace and Gaussian distributions, into the data generation process.<n>We then evaluate the utility of these DP-enhanced synthetic datasets by comparing the performance of ML models trained on them against models trained on the original data.
arXiv Detail & Related papers (2024-12-30T01:10:10Z)
Differentially Private Federated Learning of Diffusion Models for Synthetic Tabular Data Generation [5.182014186927255]
We introduce DP-Fed-FinDiff framework, a novel integration of Differential Privacy, Federated Learning and Denoising Diffusion Probabilistic Models. We demonstrate the effectiveness of DP-Fed-FinDiff on multiple real-world financial datasets. The results affirm the potential of DP-Fed-FinDiff to enable secure data sharing and robust analytics in highly regulated domains.
arXiv Detail & Related papers (2024-12-20T17:30:58Z)
DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA. Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees. Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z)
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources. RAG systems may face severe privacy risks when retrieving private data. We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z)
FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners. FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks. We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z)
Data Collaboration Analysis Over Matrix Manifolds [0.0]
Privacy-Preserving Machine Learning (PPML) addresses this challenge by safeguarding sensitive information. NRI-DC framework emerges as an innovative approach, potentially resolving the 'data island' issue among institutions. This study establishes a rigorous theoretical foundation for these collaboration functions and introduces new formulations.
arXiv Detail & Related papers (2024-03-05T08:52:16Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning [3.29354893777827]
We introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning. We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets.
arXiv Detail & Related papers (2023-07-04T18:37:11Z)
Theoretically Principled Federated Learning for Balancing Privacy and Utility [61.03993520243198]
We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters. It can achieve personalized utility-privacy trade-off for each model parameter, on each client, at each communication round in federated learning.
arXiv Detail & Related papers (2023-05-24T13:44:02Z)
Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge. Existing private generative models are struggling with the utility of synthetic samples. We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z)
Decentralized Stochastic Optimization with Inherent Privacy Protection [103.62463469366557]
Decentralized optimization is the basic building block of modern collaborative machine learning, distributed estimation and control, and large-scale sensing. Since involved data, privacy protection has become an increasingly pressing need in the implementation of decentralized optimization algorithms.
arXiv Detail & Related papers (2022-05-08T14:38:23Z)
Efficient Logistic Regression with Local Differential Privacy [0.0]
Internet of Things devices are expanding rapidly and generating huge amount of data. There is an increasing need to explore data collected from these devices. Collaborative learning provides a strategic solution for the Internet of Things settings but also raises public concern over data privacy.
arXiv Detail & Related papers (2022-02-05T22:44:03Z)
P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model [23.91327154831855]
This paper proposes privacy-preserving phased generative model (P3GM) for releasing sensitive data. P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency. Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity.
arXiv Detail & Related papers (2020-06-22T09:47:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.