Related papers: Differentially Private Non-convex Learning for Multi-layer Neural Networks

Differentially Private Non-convex Learning for Multi-layer Neural Networks

URL: http://arxiv.org/abs/2310.08425v1
Date: Thu, 12 Oct 2023 15:48:14 GMT
Title: Differentially Private Non-convex Learning for Multi-layer Neural Networks
Authors: Hanpu Shen and Cheng-Long Wang and Zihang Xiang and Yiming Ying and Di Wang
Abstract summary: This paper focuses on the problem of Differentially Private Tangent Optimization for (multi-layer) fully connected neural networks with a single output node. By utilizing recent advances in Neural Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large.
Score: 35.24835396398768
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example. In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.

Related papers

Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression: A Distribution-Free Analysis [19.988762532185884]
We show that, if the neural network is trained by GD with early stopping, then the trained network renders a sharp rate of the nonparametric regression risk of $cO(eps_n2)$. It is remarked that our result does not require distributional assumptions on the training data.
arXiv Detail & Related papers (2024-11-05T08:43:54Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer [2.1485350418225244]
We give an exact macroscopic characterization of the generalization behavior of randomized, shallow NNs with ReLU activation. We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered.
arXiv Detail & Related papers (2023-03-20T21:05:47Z)
Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We show that linear networks make provably optimal predictions at infinite depth. We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Non-Vacuous Generalisation Bounds for Shallow Neural Networks [5.799808780731661]
We focus on a specific class of shallow neural networks with a single hidden layer. We derive new generalisation bounds through the PAC-Bayesian theory. Our bounds are empirically non-vacuous when the network is trained with vanilla gradient descent on MNIST and Fashion-MNIST.
arXiv Detail & Related papers (2022-02-03T14:59:51Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical Guarantees and Implementation Details [0.5156484100374059]
Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. We propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for node selection during training. We establish the fundamental result of variational posterior consistency together with the characterization of prior parameters.
arXiv Detail & Related papers (2021-08-25T00:48:07Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Non-asymptotic Excess Risk Bounds for Classification with Deep Convolutional Neural Networks [6.051520664893158]
We consider the problem of binary classification with a class of general deep convolutional neural networks. We define the prefactors of the risk bounds in terms of the input data dimension and other model parameters. We show that the classification methods with CNNs can circumvent the curse of dimensionality.
arXiv Detail & Related papers (2021-05-01T15:55:04Z)
Globally Injective ReLU Networks [20.106755410331576]
Injectivity plays an important role in generative models where it enables inference. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks.
arXiv Detail & Related papers (2020-06-15T15:12:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.