Polynomial magic! Hermite polynomials for private data generation
- URL: http://arxiv.org/abs/2106.05042v1
- Date: Wed, 9 Jun 2021 12:56:41 GMT
- Title: Polynomial magic! Hermite polynomials for private data generation
- Authors: Mijung Park, Margarita Vinaroz, Mohammad-Amin Charusaie, Frederik
Harder
- Abstract summary: Kernel mean embedding considers infinite-dimensional features, which are challenging to handle in the context of differentially private data generation.
We propose to approximate the kernel mean embedding of data distribution using finite-dimensional random features, where sensitivity of the features becomes analytically tractable.
Unlike the random features, the Hermite features are ordered, where the low orders contain more information on the distribution than those at the high orders.
- Score: 6.7386666699567845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel mean embedding is a useful tool to compare probability measures.
Despite its usefulness, kernel mean embedding considers infinite-dimensional
features, which are challenging to handle in the context of differentially
private data generation. A recent work proposes to approximate the kernel mean
embedding of data distribution using finite-dimensional random features, where
the sensitivity of the features becomes analytically tractable. More
importantly, this approach significantly reduces the privacy cost, compared to
other known privatization methods (e.g., DP-SGD), as the approximate kernel
mean embedding of the data distribution is privatized only once and can then be
repeatedly used during training of a generator without incurring any further
privacy cost. However, the required number of random features is excessively
high, often ten thousand to a hundred thousand, which worsens the sensitivity
of the approximate kernel mean embedding. To improve the sensitivity, we
propose to replace random features with Hermite polynomial features. Unlike the
random features, the Hermite polynomial features are ordered, where the
features at the low orders contain more information on the distribution than
those at the high orders. Hence, a relatively low order of Hermite polynomial
features can more accurately approximate the mean embedding of the data
distribution compared to a significantly higher number of random features. As a
result, using the Hermite polynomial features, we significantly improve the
privacy-accuracy trade-off, reflected in the high quality and diversity of the
generated data, when tested on several heterogeneous tabular datasets, as well
as several image benchmark datasets.
Related papers
- Differentially Private Neural Tangent Kernels for Privacy-Preserving
Data Generation [32.83436754714798]
This work considers the using the features of $textitneural tangent kernels (NTKs)$, more precisely $textitempirical$ NTKs (e-NTKs)
We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data.
arXiv Detail & Related papers (2023-03-03T03:00:49Z) - Differentially private multivariate medians [4.588028371034407]
We develop novel finite-sample performance guarantees for differentially private depth-based medians.
We show that under Cauchy marginals, the cost of heavy-tailed location estimation outweighs the cost of privacy.
arXiv Detail & Related papers (2022-10-12T17:56:04Z) - Optimal Algorithms for Mean Estimation under Local Differential Privacy [55.32262879188817]
We show that PrivUnit achieves the optimal variance among a large family of locally private randomizers.
We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees.
arXiv Detail & Related papers (2022-05-05T06:43:46Z) - Private measures, random walks, and synthetic data [7.5764890276775665]
Differential privacy is a mathematical concept that provides an information-theoretic security guarantee.
We develop a private measure from a data set that allows us to efficiently construct private synthetic data.
A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables.
arXiv Detail & Related papers (2022-04-20T00:06:52Z) - Improved Random Features for Dot Product Kernels [12.321353062415701]
We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels.
We show empirically that the use of complex features can significantly reduce the variances of these approximations.
We develop a data-driven optimization approach to improve random feature approximations for general dot product kernels.
arXiv Detail & Related papers (2022-01-21T14:16:56Z) - Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs)
HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Federated Doubly Stochastic Kernel Learning for Vertically Partitioned
Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data.
We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z) - A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees.
We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more.
Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z) - DP-MERF: Differentially Private Mean Embeddings with Random Features for
Practical Privacy-Preserving Data Generation [11.312036995195594]
We propose a differentially private data generation paradigm using random feature representations of kernel mean embeddings.
We exploit the random feature representations for two important benefits.
Our algorithm achieves drastically better privacy-utility trade-offs than existing methods when tested on several datasets.
arXiv Detail & Related papers (2020-02-26T16:41:41Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.