Hidden Synergy: $L_1$ Weight Normalization and 1-Path-Norm Regularization
- URL: http://arxiv.org/abs/2404.19112v1
- Date: Mon, 29 Apr 2024 21:25:25 GMT
- Title: Hidden Synergy: $L_1$ Weight Normalization and 1-Path-Norm Regularization
- Authors: Aditya Biswas,
- Abstract summary: We show how PSiLON Net's design drastically simplifies the 1-path-norm.
We propose a pruning method to achieve exact sparsity in the final stages of training.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present PSiLON Net, an MLP architecture that uses $L_1$ weight normalization for each weight vector and shares the length parameter across the layer. The 1-path-norm provides a bound for the Lipschitz constant of a neural network and reflects on its generalizability, and we show how PSiLON Net's design drastically simplifies the 1-path-norm, while providing an inductive bias towards efficient learning and near-sparse parameters. We propose a pruning method to achieve exact sparsity in the final stages of training, if desired. To exploit the inductive bias of residual networks, we present a simplified residual block, leveraging concatenated ReLU activations. For networks constructed with such blocks, we prove that considering only a subset of possible paths in the 1-path-norm is sufficient to bound the Lipschitz constant. Using the 1-path-norm and this improved bound as regularizers, we conduct experiments in the small data regime using overparameterized PSiLON Nets and PSiLON ResNets, demonstrating reliable optimization and strong performance.
Related papers
- Decoupled Weight Decay for Any $p$ Norm [1.1510009152620668]
We consider a simple yet effective approach to sparsification, based on the Bridge, $L_p$ regularization during training.
We introduce a novel weight decay scheme, which generalizes the standard $L$ weight decay to any $p$ norm.
We empirically demonstrate that it leads to highly sparse networks, while maintaining performance comparable to standard $L$ regularization.
arXiv Detail & Related papers (2024-04-16T18:02:15Z) - Improved techniques for deterministic l2 robustness [63.34032156196848]
Training convolutional neural networks (CNNs) with a strict 1-Lipschitz constraint under the $l_2$ norm is useful for adversarial robustness, interpretable gradients and stable training.
We introduce a procedure to certify robustness of 1-Lipschitz CNNs by replacing the last linear layer with a 1-hidden layer.
We significantly advance the state-of-the-art for standard and provable robust accuracies on CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2022-11-15T19:10:12Z) - Rethinking Lipschitz Neural Networks for Certified L-infinity Robustness [33.72713778392896]
We study certified $ell_infty$ from a novel perspective of representing Boolean functions.
We develop a unified Lipschitz network that generalizes prior works, and design a practical version that can be efficiently trained.
arXiv Detail & Related papers (2022-10-04T17:55:27Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - Skew Orthogonal Convolutions [44.053067014796596]
Training convolutional neural networks with a Lipschitz constraint under the $l_2$ norm is useful for provable adversarial robustness, interpretable gradients, stable training, etc.
Methodabv allows us to train provably Lipschitz, large convolutional neural networks significantly faster than prior works.
arXiv Detail & Related papers (2021-05-24T17:11:44Z) - Pruning Redundant Mappings in Transformer Models via Spectral-Normalized
Identity Prior [54.629850694790036]
spectral-normalized identity priors (SNIP) is a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping.
We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance.
arXiv Detail & Related papers (2020-10-05T05:40:56Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z) - Learning Sparse Filters in Deep Convolutional Neural Networks with a
l1/l2 Pseudo-Norm [5.3791844634527495]
Deep neural networks (DNNs) have proven to be efficient for numerous tasks, but come at a high memory and computation cost.
Recent research has shown that their structure can be more compact without compromising their performance.
We present a sparsity-inducing regularization term based on the ratio l1/l2 pseudo-norm defined on the filter coefficients.
arXiv Detail & Related papers (2020-07-20T11:56:12Z) - Efficient Proximal Mapping of the 1-path-norm of Shallow Networks [47.20962674178505]
We show two new important properties of the 1-path-norm neural networks.
First, despite its non-smoothness and non-accuracy it allows a closed proximal operator to be efficiently computed.
Second, when the activation functions are differentiable, it provides an upper bound on the Lipschitz constant.
arXiv Detail & Related papers (2020-07-02T10:34:06Z) - On Lipschitz Regularization of Convolutional Layers using Toeplitz
Matrix Theory [77.18089185140767]
Lipschitz regularity is established as a key property of modern deep learning.
computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard.
We introduce a new upper bound for convolutional layers that is both tight and easy to compute.
arXiv Detail & Related papers (2020-06-15T13:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.