Differentially Private Truncation of Unbounded Data via Public Second Moments
- URL: http://arxiv.org/abs/2602.22282v1
- Date: Wed, 25 Feb 2026 12:21:30 GMT
- Title: Differentially Private Truncation of Unbounded Data via Public Second Moments
- Authors: Zilong Cao, Xuan Bi, Hai Zhang,
- Abstract summary: We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix.<n>PMT substantially improves the accuracy and stability of differential privacy models.
- Score: 4.662174186673445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.
Related papers
- Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families [0.0]
Many differentially private (DP) data release systems either output DP synthetic data or leave analysts to perform inference as usual.<n>This paper develops a clean and tractable middle ground for exponential families: release only DP sufficient statistics, then perform noise-calibrated likelihood-based inference and optional parametric synthetic data generation as post-processing.
arXiv Detail & Related papers (2026-03-02T15:55:54Z) - Differentially Private Linear Regression and Synthetic Data Generation with Statistical Guarantees [41.99844472131922]
In social sciences, small- to medium-scale datasets are common and linear regression (LR) is canonical.<n>In privacy-aware settings, much work has focused on differentially private (DP) LR, but mostly on point estimation with limited attention to uncertainty quantification.
arXiv Detail & Related papers (2025-10-19T19:30:41Z) - Enhancing Differentially Private Linear Regression via Public Second-Moment [2.729099903480711]
We propose a novel method that involves transforming private data using the public second-moment matrix to compute a transformed SSP-OLSE.<n>We derive theoretical error bounds about our method and the standard SSP-OLSE to the non-DP OLSE, which reveal the improved robustness and accuracy achieved by our approach.
arXiv Detail & Related papers (2025-08-25T13:55:46Z) - Machine Learning with Privacy for Protected Attributes [56.44253915927481]
We refine the definition of differential privacy (DP) to create a more general and flexible framework that we call feature differential privacy (FDP)<n>Our definition is simulation-based and allows for both addition/removal and replacement variants of privacy, and can handle arbitrary separation of protected and non-protected features.<n>We apply our framework to various machine learning tasks and show that it can significantly improve the utility of DP-trained models when public features are available.
arXiv Detail & Related papers (2025-06-24T17:53:28Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy [7.264378254137811]
Differential privacy (DP) can measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset.
DP has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google.
We propose per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances.
arXiv Detail & Related papers (2024-04-24T06:51:16Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Public Data-Assisted Mirror Descent for Private Model Training [23.717811604829148]
We revisit the problem of using public data to improve the privacy/utility tradeoffs for differentially private (DP) model training.
We show that our algorithm not only significantly improves traditional DP-SGD and DP-FedAvg, but also improves over DP-SGD and DP-FedAvg on models that have been pre-trained with the public data.
arXiv Detail & Related papers (2021-12-01T00:21:40Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.