Effective Neural Network $L_0$ Regularization With BinMask
- URL: http://arxiv.org/abs/2304.11237v1
- Date: Fri, 21 Apr 2023 20:08:57 GMT
- Title: Effective Neural Network $L_0$ Regularization With BinMask
- Authors: Kai Jia, Martin Rinard
- Abstract summary: We show that a straightforward formulation, BinMask, is an effective $L_0$ regularizer.
We evaluate BinMask on three tasks: feature selection, network sparsification, and model regularization.
- Score: 15.639601066641099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: $L_0$ regularization of neural networks is a fundamental problem. In addition
to regularizing models for better generalizability, $L_0$ regularization also
applies to selecting input features and training sparse neural networks. There
is a large body of research on related topics, some with quite complicated
methods. In this paper, we show that a straightforward formulation, BinMask,
which multiplies weights with deterministic binary masks and uses the identity
straight-through estimator for backpropagation, is an effective $L_0$
regularizer. We evaluate BinMask on three tasks: feature selection, network
sparsification, and model regularization. Despite its simplicity, BinMask
achieves competitive performance on all the benchmarks without task-specific
tuning compared to methods designed for each task. Our results suggest that
decoupling weights from mask optimization, which has been widely adopted by
previous work, is a key component for effective $L_0$ regularization.
Related papers
- Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - Why Random Pruning Is All We Need to Start Sparse [7.648170881733381]
Random masks define surprisingly effective sparse neural network models.
We show that sparser networks can compete with dense architectures and state-of-the-art lottery ticket pruning algorithms.
arXiv Detail & Related papers (2022-10-05T17:34:04Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Automatic Sparse Connectivity Learning for Neural Networks [4.875787559251317]
Well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources.
In this work, we propose a new automatic pruning method - Sparse Connectivity Learning.
Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
arXiv Detail & Related papers (2022-01-13T15:12:48Z) - Robustness Certificates for Implicit Neural Networks: A Mixed Monotone
Contractive Approach [60.67748036747221]
Implicit neural networks offer competitive performance and reduced memory consumption.
They can remain brittle with respect to input adversarial perturbations.
This paper proposes a theoretical and computational framework for robustness verification of implicit neural networks.
arXiv Detail & Related papers (2021-12-10T03:08:55Z) - Efficiently Learning Any One Hidden Layer ReLU Network From Queries [27.428198343906352]
We give the first-time algorithm for learning arbitrary one hidden layer neural networks activations provided black-box access to the network.
Ours is the first with fully-time guarantees of efficiency even for worst-case networks.
arXiv Detail & Related papers (2021-11-08T18:59:40Z) - Masksembles for Uncertainty Estimation [60.400102501013784]
Deep neural networks have amply demonstrated their prowess but estimating the reliability of their predictions remains challenging.
Deep Ensembles are widely considered as being one of the best methods for generating uncertainty estimates but are very expensive to train and evaluate.
MC-Dropout is another popular alternative, which is less expensive, but also less reliable.
arXiv Detail & Related papers (2020-12-15T14:39:57Z) - Binary Stochastic Filtering: feature selection and beyond [0.0]
This work aims at extending the neural network with ability to automatically select features by rethinking how the sparsity regularization can be used.
The proposed method has demonstrated superior efficiency when compared to a few classical methods, achieved with minimal or no computational overhead.
arXiv Detail & Related papers (2020-07-08T06:57:10Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.