Privacy-Utility Trade-off in Data Publication: A Bilevel Optimization Framework with Curvature-Guided Perturbation
- URL: http://arxiv.org/abs/2509.02048v1
- Date: Tue, 02 Sep 2025 07:44:21 GMT
- Title: Privacy-Utility Trade-off in Data Publication: A Bilevel Optimization Framework with Curvature-Guided Perturbation
- Authors: Yi Yin, Guangquan Zhang, Hua Zuo, Jie Lu,
- Abstract summary: We introduce a novel bilevel optimization framework for the publication of private datasets.<n>In the upper-level task, a discriminator guides the generation process to ensure that latent variables are mapped to high-quality samples.<n>In the lower-level task, our framework employs local extrinsic curvature on the data manifold as a quantitative measure of individual vulnerability to MIA.
- Score: 22.727580097886747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning models require datasets for effective training, but directly sharing raw data poses significant privacy risk such as membership inference attacks (MIA). To mitigate the risk, privacy-preserving techniques such as data perturbation, generalization, and synthetic data generation are commonly utilized. However, these methods often degrade data accuracy, specificity, and diversity, limiting the performance of downstream tasks and thus reducing data utility. Therefore, striking an optimal balance between privacy preservation and data utility remains a critical challenge. To address this issue, we introduce a novel bilevel optimization framework for the publication of private datasets, where the upper-level task focuses on data utility and the lower-level task focuses on data privacy. In the upper-level task, a discriminator guides the generation process to ensure that perturbed latent variables are mapped to high-quality samples, maintaining fidelity for downstream tasks. In the lower-level task, our framework employs local extrinsic curvature on the data manifold as a quantitative measure of individual vulnerability to MIA, providing a geometric foundation for targeted privacy protection. By perturbing samples toward low-curvature regions, our method effectively suppresses distinctive feature combinations that are vulnerable to MIA. Through alternating optimization of both objectives, we achieve a synergistic balance between privacy and utility. Extensive experimental evaluations demonstrate that our method not only enhances resistance to MIA in downstream tasks but also surpasses existing methods in terms of sample quality and diversity.
Related papers
- Improving Noise Efficiency in Privacy-preserving Dataset Distillation [59.57846442477106]
We introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality.<n>On CIFAR-10, our method achieves a textbf10.0% improvement with 50 images per class and textbf8.3% increase with just textbfone-fifth the distilled set size of previous state-of-the-art methods.
arXiv Detail & Related papers (2025-08-03T13:15:52Z) - A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis [11.135689359531105]
This paper proposes a novel real-value negative survey model, termed RVNS, for the first time in the field of real-value sensitive information collection.<n>The RVNS model exempts users from the necessity of discretizing their data and only requires them to sample a set of data from a range that deviates from their actual sensitive details.
arXiv Detail & Related papers (2025-07-23T04:05:33Z) - Adaptive Clipping for Privacy-Preserving Few-Shot Learning: Enhancing Generalization with Limited Data [12.614480013684759]
We introduce a novel approach called Meta-Clip to enhance the utility of privacy-preserving few-shot learning methods.<n>By dynamically adjusting clipping thresholds during the training process, our Adaptive Clipping method provides fine-grained control over the disclosure of sensitive information.<n>We demonstrate the effectiveness of our approach in minimizing utility degradation, showcasing a superior privacy-preserving trade-off compared to existing privacy-preserving techniques.
arXiv Detail & Related papers (2025-03-27T05:14:18Z) - Personalized Language Models via Privacy-Preserving Evolutionary Model Merging [53.97323896430374]
Personalization in language models aims to tailor model behavior to individual users or user groups.<n>We propose Privacy-Preserving Model Merging via Evolutionary Algorithms (PriME)<n>PriME employs gradient-free methods to directly optimize utility while reducing privacy risks.<n>Experiments on the LaMP benchmark show that PriME consistently outperforms a range of baselines, achieving up to a 45% improvement in task performance.
arXiv Detail & Related papers (2025-03-23T09:46:07Z) - Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.<n>Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.<n>We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Multi-Objective Optimization-Based Anonymization of Structured Data for Machine Learning Application [0.5452584641316627]
Various techniques have been proposed to address privacy concerns in data sharing.<n>These methods often degrade data utility, impacting the performance of machine learning (ML) models.<n>We propose a novel multi-objective optimization model that simultaneously minimizes information loss and maximizes protection against attacks.
arXiv Detail & Related papers (2025-01-02T01:52:36Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - The Data Minimization Principle in Machine Learning [61.17813282782266]
Data minimization aims to reduce the amount of data collected, processed or retained.
It has been endorsed by various global data protection regulations.
However, its practical implementation remains a challenge due to the lack of a rigorous formulation.
arXiv Detail & Related papers (2024-05-29T19:40:27Z) - Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization [2.28438857884398]
We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction and an Expectation Maximization (EM) approach optimized for structured data privacy.
Our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy.
The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types.
arXiv Detail & Related papers (2024-04-24T22:58:42Z) - Data Collaboration Analysis with Orthonormal Basis Selection and Alignment [2.928964540437144]
Data Collaboration (DC) enables multiple parties to jointly train a model without exposing their private datasets.<n>Existing theory asserts that any target basis spanning the same subspace as the secret bases should suffice.<n>We introduce Orthonormal Data Collaboration (ODC), a novel DC framework that explicitly enforces orthonormality constraints on both the secret and target bases.
arXiv Detail & Related papers (2024-03-05T08:52:16Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.