Related papers: Improve the Robustness and Accuracy of Deep Neural Network with $L

Improve the Robustness and Accuracy of Deep Neural Network with $L_{2,\infty}$ Normalization

URL: http://arxiv.org/abs/2010.04912v1
Date: Sat, 10 Oct 2020 05:45:45 GMT
Title: Improve the Robustness and Accuracy of Deep Neural Network with $L_{2,\infty}$ Normalization
Authors: Lijia Yu and Xiao-Shan Gao
Abstract summary: The robustness and accuracy of the deep neural network (DNN) was enhanced by introducing the $L_2,infty$ normalization. It is proved that the $L_2,infty$ normalization leads to large dihedral angles between two adjacent faces of the polyhedron graph of the DNN function.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, the robustness and accuracy of the deep neural network (DNN) was enhanced by introducing the $L_{2,\infty}$ normalization of the weight matrices of the DNN with Relu as the activation function. It is proved that the $L_{2,\infty}$ normalization leads to large dihedral angles between two adjacent faces of the polyhedron graph of the DNN function and hence smoother DNN functions, which reduces over-fitting. A measure is proposed for the robustness of a classification DNN, which is the average radius of the maximal robust spheres with the sample data as centers. A lower bound for the robustness measure is given in terms of the $L_{2,\infty}$ norm. Finally, an upper bound for the Rademacher complexity of DNN with $L_{2,\infty}$ normalization is given. An algorithm is given to train a DNN with the $L_{2,\infty}$ normalization and experimental results are used to show that the $L_{2,\infty}$ normalization is effective to improve the robustness and accuracy.

Related papers

Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$. We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z)
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods [43.32546195968771]
We study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation. Our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds. We show that a large step-size significantly improves upon the NTK regime's results in classifying the XOR distribution.
arXiv Detail & Related papers (2024-10-13T21:49:29Z)
Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval [49.825549809652436]
$k$NN-MT constructs an external datastore to store domain-specific translation knowledge. adaptive retrieval ($k$NN-MT-AR) dynamically estimates $lambda$ and skips $k$NN retrieval if $lambda$ is less than a fixed threshold. We propose dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects.
arXiv Detail & Related papers (2024-06-10T07:36:55Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds [75.51968172401394]
We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN) In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs. We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
arXiv Detail & Related papers (2023-09-25T07:31:22Z)
A multiobjective continuation method to compute the regularization path of deep neural networks [1.3654846342364308]
Sparsity is a highly feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models, and robustness. We present an algorithm that allows for the entire sparse front for the above-mentioned objectives in a very efficient manner for high-dimensional gradients with millions of parameters. We demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.
arXiv Detail & Related papers (2023-08-23T10:08:52Z)
Quasi-optimal $hp$-finite element refinements towards singularities via deep neural network prediction [0.3149883354098941]
We show how to construct the deep neural network expert to predict quasi-optimal $hp$-refinements for a given computational problem. For the training, we use a two-grid paradigm self-adaptive $hp$-FEM algorithm. We show that the exponential convergence delivered by the self-adaptive $hp$-FEM can be preserved if we continue refinements with a properly trained DNN expert.
arXiv Detail & Related papers (2022-09-13T09:45:57Z)
Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis [121.9821494461427]
We show how to significantly reduce the number of neurons required for two-layer ReLU networks. We also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.
arXiv Detail & Related papers (2022-06-26T06:51:31Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Approximating smooth functions by deep neural networks with sigmoid activation function [0.0]
We study the power of deep neural networks (DNNs) with sigmoid activation function. We show that DNNs with fixed depth and a width of order $Md$ achieve an approximation rate of $M-2p$.
arXiv Detail & Related papers (2020-10-08T07:29:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.