Related papers: Effective Neural Network $L_0$ Regularization With BinMask

Effective Neural Network $L_0$ Regularization With BinMask

URL: http://arxiv.org/abs/2304.11237v1
Date: Fri, 21 Apr 2023 20:08:57 GMT
Title: Effective Neural Network $L_0$ Regularization With BinMask
Authors: Kai Jia, Martin Rinard
Abstract summary: We show that a straightforward formulation, BinMask, is an effective $L_0$ regularizer. We evaluate BinMask on three tasks: feature selection, network sparsification, and model regularization.
Score: 15.639601066641099
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: $L_0$ regularization of neural networks is a fundamental problem. In addition to regularizing models for better generalizability, $L_0$ regularization also applies to selecting input features and training sparse neural networks. There is a large body of research on related topics, some with quite complicated methods. In this paper, we show that a straightforward formulation, BinMask, which multiplies weights with deterministic binary masks and uses the identity straight-through estimator for backpropagation, is an effective $L_0$ regularizer. We evaluate BinMask on three tasks: feature selection, network sparsification, and model regularization. Despite its simplicity, BinMask achieves competitive performance on all the benchmarks without task-specific tuning compared to methods designed for each task. Our results suggest that decoupling weights from mask optimization, which has been widely adopted by previous work, is a key component for effective $L_0$ regularization.

Related papers

MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models [53.36415620647177]
Semi-structured sparsity offers a promising solution by strategically retaining $N$ elements out of every $M$ weights.<n>Existing (N:M)-compatible approaches typically fall into two categories: rule-based layerwise greedy search, which suffers from considerable errors, and gradient-driven learning, which incurs prohibitive training costs.<n>We propose a novel linear-space probabilistic framework named MaskPro, which aims to learn a prior categorical distribution for every $M$ consecutive weights and subsequently leverages this distribution to generate the (N:M)-sparsity throughout an $N$-way sampling
arXiv Detail & Related papers (2025-06-15T15:02:59Z)
Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks. The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z)
Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer) In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks. It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z)
Why Random Pruning Is All We Need to Start Sparse [7.648170881733381]
Random masks define surprisingly effective sparse neural network models. We show that sparser networks can compete with dense architectures and state-of-the-art lottery ticket pruning algorithms.
arXiv Detail & Related papers (2022-10-05T17:34:04Z)
Robust Training and Verification of Implicit Neural Networks: A Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks. We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network. We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z)
Automatic Sparse Connectivity Learning for Neural Networks [4.875787559251317]
Well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources. In this work, we propose a new automatic pruning method - Sparse Connectivity Learning. Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
arXiv Detail & Related papers (2022-01-13T15:12:48Z)
Robustness Certificates for Implicit Neural Networks: A Mixed Monotone Contractive Approach [60.67748036747221]
Implicit neural networks offer competitive performance and reduced memory consumption. They can remain brittle with respect to input adversarial perturbations. This paper proposes a theoretical and computational framework for robustness verification of implicit neural networks.
arXiv Detail & Related papers (2021-12-10T03:08:55Z)
Efficiently Learning Any One Hidden Layer ReLU Network From Queries [27.428198343906352]
We give the first-time algorithm for learning arbitrary one hidden layer neural networks activations provided black-box access to the network. Ours is the first with fully-time guarantees of efficiency even for worst-case networks.
arXiv Detail & Related papers (2021-11-08T18:59:40Z)
Masksembles for Uncertainty Estimation [60.400102501013784]
Deep neural networks have amply demonstrated their prowess but estimating the reliability of their predictions remains challenging. Deep Ensembles are widely considered as being one of the best methods for generating uncertainty estimates but are very expensive to train and evaluate. MC-Dropout is another popular alternative, which is less expensive, but also less reliable.
arXiv Detail & Related papers (2020-12-15T14:39:57Z)
Binary Stochastic Filtering: feature selection and beyond [0.0]
This work aims at extending the neural network with ability to automatically select features by rethinking how the sparsity regularization can be used. The proposed method has demonstrated superior efficiency when compared to a few classical methods, achieved with minimal or no computational overhead.
arXiv Detail & Related papers (2020-07-08T06:57:10Z)
PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden. We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules. Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.