Sup-Norm Convergence of Deep Neural Network Estimator for Nonparametric
Regression by Adversarial Training
- URL: http://arxiv.org/abs/2307.04042v1
- Date: Sat, 8 Jul 2023 20:24:14 GMT
- Title: Sup-Norm Convergence of Deep Neural Network Estimator for Nonparametric
Regression by Adversarial Training
- Authors: Masaaki Imaizumi
- Abstract summary: We show the sup-norm convergence of deep neural network estimators with a novel adversarial training scheme.
A deep neural network estimator achieves the optimal rate in the sup-norm sense by the proposed adversarial training with correction.
- Score: 5.68558935178946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We show the sup-norm convergence of deep neural network estimators with a
novel adversarial training scheme. For the nonparametric regression problem, it
has been shown that an estimator using deep neural networks can achieve better
performances in the sense of the $L2$-norm. In contrast, it is difficult for
the neural estimator with least-squares to achieve the sup-norm convergence,
due to the deep structure of neural network models. In this study, we develop
an adversarial training scheme and investigate the sup-norm convergence of deep
neural network estimators. First, we find that ordinary adversarial training
makes neural estimators inconsistent. Second, we show that a deep neural
network estimator achieves the optimal rate in the sup-norm sense by the
proposed adversarial training with correction. We extend our adversarial
training to general setups of a loss function and a data-generating function.
Our experiments support the theoretical findings.
Related papers
- Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Global quantitative robustness of regression feed-forward neural
networks [0.0]
We adapt the notion of the regression breakdown point to regression neural networks.
We compare the performance, measured by the out-of-sample loss, by a proxy of the breakdown rate.
The results indeed motivate to use robust loss functions for neural network training.
arXiv Detail & Related papers (2022-11-18T09:57:53Z) - Can pruning improve certified robustness of neural networks? [106.03070538582222]
We show that neural network pruning can improve empirical robustness of deep neural networks (NNs)
Our experiments show that by appropriately pruning an NN, its certified accuracy can be boosted up to 8.2% under standard training.
We additionally observe the existence of certified lottery tickets that can match both standard and certified robust accuracies of the original dense models.
arXiv Detail & Related papers (2022-06-15T05:48:51Z) - How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z) - A Kernel-Expanded Stochastic Neural Network [10.837308632004644]
Deep neural network often gets trapped into a local minimum in training.
New kernel-expanded neural network (K-StoNet) model reformulates the network as a latent variable model.
Model can be easily trained using the imputationregularized optimization (IRO) algorithm.
arXiv Detail & Related papers (2022-01-14T06:42:42Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Measurement error models: from nonparametric methods to deep neural
networks [3.1798318618973362]
We propose an efficient neural network design for estimating measurement error models.
We use a fully connected feed-forward neural network to approximate the regression function $f(x)$.
We conduct an extensive numerical study to compare the neural network approach with classical nonparametric methods.
arXiv Detail & Related papers (2020-07-15T06:05:37Z) - A Deep Conditioning Treatment of Neural Networks [37.192369308257504]
We show that depth improves trainability of neural networks by improving the conditioning of certain kernel matrices of the input data.
We provide versions of the result that hold for training just the top layer of the neural network, as well as for training all layers via the neural tangent kernel.
arXiv Detail & Related papers (2020-02-04T20:21:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.