Privately Learning Subspaces
- URL: http://arxiv.org/abs/2106.00001v1
- Date: Fri, 28 May 2021 21:09:23 GMT
- Title: Privately Learning Subspaces
- Authors: Vikrant Singhal, Thomas Steinke
- Abstract summary: We present differentially private algorithms that take input data sampled from a low-dimensional linear subspace and output that subspace.
These algorithms can serve as a pre-processing step for other procedures.
- Score: 16.805122710333826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Private data analysis suffers a costly curse of dimensionality. However, the
data often has an underlying low-dimensional structure. For example, when
optimizing via gradient descent, the gradients often lie in or near a
low-dimensional subspace. If that low-dimensional structure can be identified,
then we can avoid paying (in terms of privacy or accuracy) for the high ambient
dimension.
We present differentially private algorithms that take input data sampled
from a low-dimensional linear subspace (possibly with a small amount of error)
and output that subspace (or an approximation to it). These algorithms can
serve as a pre-processing step for other procedures.
Related papers
- On Differentially Private Subspace Estimation in a Distribution-Free Setting [3.8888996044605855]
Many datasets possess an inherent low-dimensional structure.
If the low-dimensional structure could be privately identified using a small amount of gradients, we could avoid paying for the high ambient dimension.
We provide the first measures that quantify as a function of multiplicative singular-value gaps in the input dataset.
arXiv Detail & Related papers (2024-02-09T15:17:53Z) - Choosing Public Datasets for Private Machine Learning via Gradient
Subspace Distance [35.653510597396114]
Differentially private gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters.
Recent works suggest that we can reduce the noise by leveraging public data for private machine learning, by projecting gradients onto a subspace prescribed by the public data.
We give an algorithm for selecting a public dataset by measuring a low-dimensional subspace distance between gradients of the public and private examples.
arXiv Detail & Related papers (2023-03-02T13:36:28Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Side-effects of Learning from Low Dimensional Data Embedded in an
Euclidean Space [3.093890460224435]
We study the potential regularization effects associated with the network's depth and noise in needs codimension of the data manifold.
We also present additional side effects in training due to the presence of noise.
arXiv Detail & Related papers (2022-03-01T16:55:51Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - On the Differentially Private Nature of Perturbed Gradient Descent [15.554148012395457]
We consider the problem of empirical minimization given a database, using the gradient descent algorithm.
A gradient descent algorithm is typically employed to escape the differential saddle points.
arXiv Detail & Related papers (2021-01-18T02:29:37Z) - Bypassing the Ambient Dimension: Private SGD with Gradient Subspace
Identification [47.23063195722975]
Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM)
Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model.
We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
arXiv Detail & Related papers (2020-07-07T22:31:01Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.