Improve the Robustness and Accuracy of Deep Neural Network with
$L_{2,\infty}$ Normalization
- URL: http://arxiv.org/abs/2010.04912v1
- Date: Sat, 10 Oct 2020 05:45:45 GMT
- Title: Improve the Robustness and Accuracy of Deep Neural Network with
$L_{2,\infty}$ Normalization
- Authors: Lijia Yu and Xiao-Shan Gao
- Abstract summary: The robustness and accuracy of the deep neural network (DNN) was enhanced by introducing the $L_2,infty$ normalization.
It is proved that the $L_2,infty$ normalization leads to large dihedral angles between two adjacent faces of the polyhedron graph of the DNN function.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, the robustness and accuracy of the deep neural network (DNN)
was enhanced by introducing the $L_{2,\infty}$ normalization of the weight
matrices of the DNN with Relu as the activation function. It is proved that the
$L_{2,\infty}$ normalization leads to large dihedral angles between two
adjacent faces of the polyhedron graph of the DNN function and hence smoother
DNN functions, which reduces over-fitting. A measure is proposed for the
robustness of a classification DNN, which is the average radius of the maximal
robust spheres with the sample data as centers. A lower bound for the
robustness measure is given in terms of the $L_{2,\infty}$ norm. Finally, an
upper bound for the Rademacher complexity of DNN with $L_{2,\infty}$
normalization is given. An algorithm is given to train a DNN with the
$L_{2,\infty}$ normalization and experimental results are used to show that the
$L_{2,\infty}$ normalization is effective to improve the robustness and
accuracy.
Related papers
- Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods [43.32546195968771]
We study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation.
Our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds.
We show that a large step-size significantly improves upon the NTK regime's results in classifying the XOR distribution.
arXiv Detail & Related papers (2024-10-13T21:49:29Z) - Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval [49.825549809652436]
$k$NN-MT constructs an external datastore to store domain-specific translation knowledge.
adaptive retrieval ($k$NN-MT-AR) dynamically estimates $lambda$ and skips $k$NN retrieval if $lambda$ is less than a fixed threshold.
We propose dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects.
arXiv Detail & Related papers (2024-06-10T07:36:55Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds [75.51968172401394]
We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN)
In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs.
We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
arXiv Detail & Related papers (2023-09-25T07:31:22Z) - A multiobjective continuation method to compute the regularization path of deep neural networks [1.3654846342364308]
Sparsity is a highly feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models, and robustness.
We present an algorithm that allows for the entire sparse front for the above-mentioned objectives in a very efficient manner for high-dimensional gradients with millions of parameters.
We demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.
arXiv Detail & Related papers (2023-08-23T10:08:52Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Quasi-optimal $hp$-finite element refinements towards singularities via
deep neural network prediction [0.3149883354098941]
We show how to construct the deep neural network expert to predict quasi-optimal $hp$-refinements for a given computational problem.
For the training, we use a two-grid paradigm self-adaptive $hp$-FEM algorithm.
We show that the exponential convergence delivered by the self-adaptive $hp$-FEM can be preserved if we continue refinements with a properly trained DNN expert.
arXiv Detail & Related papers (2022-09-13T09:45:57Z) - Bounding the Width of Neural Networks via Coupled Initialization -- A
Worst Case Analysis [121.9821494461427]
We show how to significantly reduce the number of neurons required for two-layer ReLU networks.
We also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.
arXiv Detail & Related papers (2022-06-26T06:51:31Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Approximating smooth functions by deep neural networks with sigmoid
activation function [0.0]
We study the power of deep neural networks (DNNs) with sigmoid activation function.
We show that DNNs with fixed depth and a width of order $Md$ achieve an approximation rate of $M-2p$.
arXiv Detail & Related papers (2020-10-08T07:29:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.