Differentially Private Non-convex Learning for Multi-layer Neural
Networks
- URL: http://arxiv.org/abs/2310.08425v1
- Date: Thu, 12 Oct 2023 15:48:14 GMT
- Title: Differentially Private Non-convex Learning for Multi-layer Neural
Networks
- Authors: Hanpu Shen and Cheng-Long Wang and Zihang Xiang and Yiming Ying and Di
Wang
- Abstract summary: This paper focuses on the problem of Differentially Private Tangent Optimization for (multi-layer) fully connected neural networks with a single output node.
By utilizing recent advances in Neural Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large.
- Score: 35.24835396398768
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on the problem of Differentially Private Stochastic
Optimization for (multi-layer) fully connected neural networks with a single
output node. In the first part, we examine cases with no hidden nodes,
specifically focusing on Generalized Linear Models (GLMs). We investigate the
well-specific model where the random noise possesses a zero mean, and the link
function is both bounded and Lipschitz continuous. We propose several
algorithms and our analysis demonstrates the feasibility of achieving an excess
population risk that remains invariant to the data dimension. We also delve
into the scenario involving the ReLU link function, and our findings mirror
those of the bounded link function. We conclude this section by contrasting
well-specified and misspecified models, using ReLU regression as a
representative example.
In the second part of the paper, we extend our ideas to two-layer neural
networks with sigmoid or ReLU activation functions in the well-specified model.
In the third part, we study the theoretical guarantees of DP-SGD in Abadi et
al. (2016) for fully connected multi-layer neural networks. By utilizing recent
advances in Neural Tangent Kernel theory, we provide the first excess
population risk when both the sample size and the width of the network are
sufficiently large. Additionally, we discuss the role of some parameters in
DP-SGD regarding their utility, both theoretically and empirically.
Related papers
- Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression: A Distribution-Free Analysis [19.988762532185884]
We show that, if the neural network is trained by GD with early stopping, then the trained network renders a sharp rate of the nonparametric regression risk of $cO(eps_n2)$.
It is remarked that our result does not require distributional assumptions on the training data.
arXiv Detail & Related papers (2024-11-05T08:43:54Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - How (Implicit) Regularization of ReLU Neural Networks Characterizes the
Learned Function -- Part II: the Multi-D Case of Two Layers with Random First
Layer [2.1485350418225244]
We give an exact macroscopic characterization of the generalization behavior of randomized, shallow NNs with ReLU activation.
We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered.
arXiv Detail & Related papers (2023-03-20T21:05:47Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Non-Vacuous Generalisation Bounds for Shallow Neural Networks [5.799808780731661]
We focus on a specific class of shallow neural networks with a single hidden layer.
We derive new generalisation bounds through the PAC-Bayesian theory.
Our bounds are empirically non-vacuous when the network is trained with vanilla gradient descent on MNIST and Fashion-MNIST.
arXiv Detail & Related papers (2022-02-03T14:59:51Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Non-asymptotic Excess Risk Bounds for Classification with Deep
Convolutional Neural Networks [6.051520664893158]
We consider the problem of binary classification with a class of general deep convolutional neural networks.
We define the prefactors of the risk bounds in terms of the input data dimension and other model parameters.
We show that the classification methods with CNNs can circumvent the curse of dimensionality.
arXiv Detail & Related papers (2021-05-01T15:55:04Z) - Globally Injective ReLU Networks [20.106755410331576]
Injectivity plays an important role in generative models where it enables inference.
We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks.
arXiv Detail & Related papers (2020-06-15T15:12:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.