DELTA: Variational Disentangled Learning for Privacy-Preserving Data Reprogramming
- URL: http://arxiv.org/abs/2509.00693v1
- Date: Sun, 31 Aug 2025 04:18:42 GMT
- Title: DELTA: Variational Disentangled Learning for Privacy-Preserving Data Reprogramming
- Authors: Arun Vignesh Malarkkan, Haoyue Bai, Anjali Kaushik, Yanjie Fu,
- Abstract summary: We propose DELTA, a two-phase variational disentangled generative learning framework.<n>Phase I uses policy-guided reinforcement learning to discover feature transformations with downstream task utility, without any regard to privacy inferability.<n>Phase II employs a variational LSTM seq2seq encoder-decoder with a utility-privacy disentangled latent space design and adversarial-causal disentanglement regularization to suppress privacy signals.
- Score: 20.87548031005583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In real-world applications, domain data often contains identifiable or sensitive attributes, is subject to strict regulations (e.g., HIPAA, GDPR), and requires explicit data feature engineering for interpretability and transparency. Existing feature engineering primarily focuses on advancing downstream task performance, often risking privacy leakage. We generalize this learning task under such new requirements as Privacy-Preserving Data Reprogramming (PPDR): given a dataset, transforming features to maximize target attribute prediction accuracy while minimizing sensitive attribute prediction accuracy. PPDR poses challenges for existing systems: 1) generating high-utility feature transformations without being overwhelmed by a large search space, and 2) disentangling and eliminating sensitive information from utility-oriented features to reduce privacy inferability. To tackle these challenges, we propose DELTA, a two-phase variational disentangled generative learning framework. Phase I uses policy-guided reinforcement learning to discover feature transformations with downstream task utility, without any regard to privacy inferability. Phase II employs a variational LSTM seq2seq encoder-decoder with a utility-privacy disentangled latent space design and adversarial-causal disentanglement regularization to suppress privacy signals during feature generation. Experiments on eight datasets show DELTA improves predictive performance by ~9.3% and reduces privacy leakage by ~35%, demonstrating robust, privacy-aware data transformation.
Related papers
- Unintended Memorization of Sensitive Information in Fine-Tuned Language Models [24.228889351240838]
Fine-tuning Large Language Models (LLMs) on sensitive datasets carries a substantial risk of unintended memorization and leakage of Personally Identifiable Information (PII)<n>We design controlled extraction probes to quantify unintended PII memorization and study how factors such as language, PII frequency, task type, and model size influence memorization behavior.
arXiv Detail & Related papers (2026-01-24T15:08:45Z) - Adversary-Aware Private Inference over Wireless Channels [51.93574339176914]
AI-based sensing at wireless edge devices has the potential to significantly enhance Artificial Intelligence (AI) applications.<n>As sensitive personal data can be reconstructed by an adversary, transformation of the features are required to reduce the risk of privacy violations.<n>We propose a novel framework for privacy-preserving AI-based sensing, where devices apply transformations of extracted features before transmission to a model server.
arXiv Detail & Related papers (2025-10-23T13:02:14Z) - RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting [17.294176570269]
We propose a reinforcement learning framework that fine-tunes a large language model (LLM) using a composite reward function.<n>The privacy reward combines semantic cues with structural patterns derived from a minimum spanning tree (MST) over latent representations.<n> Empirical results show that the proposed method significantly enhances author obfuscation and privacy metrics without degrading semantic quality.
arXiv Detail & Related papers (2025-08-25T04:38:19Z) - Improving Noise Efficiency in Privacy-preserving Dataset Distillation [59.57846442477106]
We introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality.<n>On CIFAR-10, our method achieves a textbf10.0% improvement with 50 images per class and textbf8.3% increase with just textbfone-fifth the distilled set size of previous state-of-the-art methods.
arXiv Detail & Related papers (2025-08-03T13:15:52Z) - Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z) - Dual Utilization of Perturbation for Stream Data Publication under Local Differential Privacy [10.07017446059039]
Local differential privacy (LDP) has emerged as a promising standard.<n>Applying LDP to stream data presents significant challenges, as stream data often involves a large or even infinite number of values.<n>We introduce the Iterative Perturbation IPP method, which utilizes current perturbed results to calibrate the subsequent perturbation process.<n>We prove that these three algorithms satisfy $w$-event differential privacy while significantly improving utility.
arXiv Detail & Related papers (2025-04-21T09:51:18Z) - VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences.<n>Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z) - PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders [8.483679748399037]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing but pose privacy risks by memorizing and leaking Personally Identifiable Information (PII)<n>Existing mitigation strategies, such as differential privacy and neuron-level interventions, often degrade model utility or fail to effectively prevent leakage.<n>We introduce PrivacyScalpel, a novel privacy-preserving framework that leverages interpretability techniques to identify and mitigate PII leakage while maintaining performance.
arXiv Detail & Related papers (2025-03-14T09:31:01Z) - Collaborative Inference over Wireless Channels with Feature Differential Privacy [57.68286389879283]
Collaborative inference among multiple wireless edge devices has the potential to significantly enhance Artificial Intelligence (AI) applications.
transmitting extracted features poses a significant privacy risk, as sensitive personal data can be exposed during the process.
We propose a novel privacy-preserving collaborative inference mechanism, wherein each edge device in the network secures the privacy of extracted features before transmitting them to a central server for inference.
arXiv Detail & Related papers (2024-10-25T18:11:02Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.<n>RAG systems may face severe privacy risks when retrieving private data.<n>We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective [10.009178591853058]
We propose a formal information-theoretic definition for this utility-preserving privacy protection problem.
We design a data-driven learnable data transformation framework that is capable of suppressing sensitive attributes from target datasets.
Results demonstrate the effectiveness and generalizability of our method under various configurations.
arXiv Detail & Related papers (2024-05-23T18:35:46Z) - TeD-SPAD: Temporal Distinctiveness for Self-supervised
Privacy-preservation for video Anomaly Detection [59.04634695294402]
Video anomaly detection (VAD) without human monitoring is a complex computer vision task.
Privacy leakage in VAD allows models to pick up and amplify unnecessary biases related to people's personal information.
We propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner.
arXiv Detail & Related papers (2023-08-21T22:42:55Z) - Over-the-Air Federated Learning with Privacy Protection via Correlated
Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server.
Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy.
In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.