Related papers: Polynomial magic! Hermite polynomials for private data generation

Polynomial magic! Hermite polynomials for private data generation

URL: http://arxiv.org/abs/2106.05042v1
Date: Wed, 9 Jun 2021 12:56:41 GMT
Title: Polynomial magic! Hermite polynomials for private data generation
Authors: Mijung Park, Margarita Vinaroz, Mohammad-Amin Charusaie, Frederik Harder
Abstract summary: Kernel mean embedding considers infinite-dimensional features, which are challenging to handle in the context of differentially private data generation. We propose to approximate the kernel mean embedding of data distribution using finite-dimensional random features, where sensitivity of the features becomes analytically tractable. Unlike the random features, the Hermite features are ordered, where the low orders contain more information on the distribution than those at the high orders.
Score: 6.7386666699567845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Kernel mean embedding is a useful tool to compare probability measures. Despite its usefulness, kernel mean embedding considers infinite-dimensional features, which are challenging to handle in the context of differentially private data generation. A recent work proposes to approximate the kernel mean embedding of data distribution using finite-dimensional random features, where the sensitivity of the features becomes analytically tractable. More importantly, this approach significantly reduces the privacy cost, compared to other known privatization methods (e.g., DP-SGD), as the approximate kernel mean embedding of the data distribution is privatized only once and can then be repeatedly used during training of a generator without incurring any further privacy cost. However, the required number of random features is excessively high, often ten thousand to a hundred thousand, which worsens the sensitivity of the approximate kernel mean embedding. To improve the sensitivity, we propose to replace random features with Hermite polynomial features. Unlike the random features, the Hermite polynomial features are ordered, where the features at the low orders contain more information on the distribution than those at the high orders. Hence, a relatively low order of Hermite polynomial features can more accurately approximate the mean embedding of the data distribution compared to a significantly higher number of random features. As a result, using the Hermite polynomial features, we significantly improve the privacy-accuracy trade-off, reflected in the high quality and diversity of the generated data, when tested on several heterogeneous tabular datasets, as well as several image benchmark datasets.

Related papers

Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency [53.98433419539793]
We study the problem of spectral graph clustering under edge differential privacy (DP)<n>Specifically, we develop three mechanisms: (i) graph perturbation via randomized edge flipping combined with adjacency matrix shuffling, which enforces edge privacy; (ii) private graph projection with additive Gaussian noise in a lower-dimensional space to reduce dimensionality and computational complexity; and (iii) a noisy power iteration method that distributes Gaussian noise across iterations to ensure edge DP while maintaining convergence.
arXiv Detail & Related papers (2025-10-08T15:30:27Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation [32.83436754714798]
This work considers the using the features of $textitneural tangent kernels (NTKs)$, more precisely $textitempirical$ NTKs (e-NTKs) We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data.
arXiv Detail & Related papers (2023-03-03T03:00:49Z)
Differentially private multivariate medians [4.588028371034407]
We develop novel finite-sample performance guarantees for differentially private depth-based medians. We show that under Cauchy marginals, the cost of heavy-tailed location estimation outweighs the cost of privacy.
arXiv Detail & Related papers (2022-10-12T17:56:04Z)
Optimal Algorithms for Mean Estimation under Local Differential Privacy [55.32262879188817]
We show that PrivUnit achieves the optimal variance among a large family of locally private randomizers. We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees.
arXiv Detail & Related papers (2022-05-05T06:43:46Z)
Private measures, random walks, and synthetic data [7.5764890276775665]
Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. We develop a private measure from a data set that allows us to efficiently construct private synthetic data. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables.
arXiv Detail & Related papers (2022-04-20T00:06:52Z)
Improved Random Features for Dot Product Kernels [12.321353062415701]
We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels. We show empirically that the use of complex features can significantly reduce the variances of these approximations. We develop a data-driven optimization approach to improve random feature approximations for general dot product kernels.
arXiv Detail & Related papers (2022-01-21T14:16:56Z)
Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z)
A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z)
Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data. We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z)
A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees. We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more. Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z)
DP-MERF: Differentially Private Mean Embeddings with Random Features for Practical Privacy-Preserving Data Generation [11.312036995195594]
We propose a differentially private data generation paradigm using random feature representations of kernel mean embeddings. We exploit the random feature representations for two important benefits. Our algorithm achieves drastically better privacy-utility trade-offs than existing methods when tested on several datasets.
arXiv Detail & Related papers (2020-02-26T16:41:41Z)
Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees. Our approach leads to significantly better bounds for datasets with known rates of singular value decay. We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.