Accelerated Sparse Neural Training: A Provable and Efficient Method to
Find N:M Transposable Masks
- URL: http://arxiv.org/abs/2102.08124v1
- Date: Tue, 16 Feb 2021 12:44:16 GMT
- Title: Accelerated Sparse Neural Training: A Provable and Efficient Method to
Find N:M Transposable Masks
- Authors: Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Seffi Naor,
Daniel Soudry
- Abstract summary: Recently, researchers proposed pruning deep neural network weights (DNNs) using an $N:M$ fine-grained block sparsity mask.
We propose a novel transposable-fine-grained sparsity mask where the same mask can be used for both forward and backward passes.
Our experiments suggest 2x speed-up with no accuracy degradation over vision and language models.
- Score: 28.498176073737422
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, researchers proposed pruning deep neural network weights (DNNs)
using an $N:M$ fine-grained block sparsity mask. In this mask, for each block
of $M$ weights, we have at least $N$ zeros. In contrast to unstructured
sparsity, $N:M$ fine-grained block sparsity allows acceleration in actual
modern hardware. So far, this was used for DNN acceleration at the inference
phase. First, we suggest a method to convert a pretrained model with
unstructured sparsity to a $N:M$ fine-grained block sparsity model, with little
to no training. Then, to also allow such acceleration in the training phase, we
suggest a novel transposable-fine-grained sparsity mask where the same mask can
be used for both forward and backward passes. Our transposable mask ensures
that both the weight matrix and its transpose follow the same sparsity pattern;
thus the matrix multiplication required for passing the error backward can also
be accelerated. We discuss the transposable constraint and devise a new measure
for mask constraints, called mask-diversity (MD), which correlates with their
expected accuracy. Then, we formulate the problem of finding the optimal
transposable mask as a minimum-cost-flow problem and suggest a fast linear
approximation that can be used when the masks dynamically change while
training. Our experiments suggest 2x speed-up with no accuracy degradation over
vision and language models. A reference implementation can be found at
https://github.com/papers-submission/structured_transposable_masks.
Related papers
- MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models [91.4190318047519]
This work introduces MaskLLM, a learnable pruning method that establishes Semi-structured (or N:M'') Sparsity in Large Language Models.
MaskLLM explicitly models N:M patterns as a learnable distribution through Gumbel Softmax sampling.
arXiv Detail & Related papers (2024-09-26T02:37:41Z) - MP-Former: Mask-Piloted Transformer for Image Segmentation [16.620469868310288]
Mask2Former suffers from inconsistent mask predictions between decoder layers.
We propose a mask-piloted training approach, which feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones.
arXiv Detail & Related papers (2023-03-13T17:57:59Z) - Bi-directional Masks for Efficient N:M Sparse Training [64.9617631724811]
We present a novel method of Bi-directional Masks (Bi-Mask) with its two central innovations.
It disentangles the forward and backward weight sparsity and overcomes the very dense gradient.
Compared with existing uni-directional scenario that applies a transposable mask and enables backward acceleration, our Bi-Mask is experimentally demonstrated to be more superior in performance.
arXiv Detail & Related papers (2023-02-13T02:32:02Z) - Towards Improved Input Masking for Convolutional Neural Networks [66.99060157800403]
We propose a new masking method for CNNs we call layer masking.
We show that our method is able to eliminate or minimize the influence of the mask shape or color on the output of the model.
We also demonstrate how the shape of the mask may leak information about the class, thus affecting estimates of model reliance on class-relevant features.
arXiv Detail & Related papers (2022-11-26T19:31:49Z) - Mask Transfiner for High-Quality Instance Segmentation [95.74244714914052]
We present Mask Transfiner for high-quality and efficient instance segmentation.
Our approach only processes detected error-prone tree nodes and self-corrects their errors in parallel.
Our code and trained models will be available at http://vis.xyz/pub/transfiner.
arXiv Detail & Related papers (2021-11-26T18:58:22Z) - KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning [49.77278179376902]
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as textitcatastrophic forgetting.
Recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets.
We propose a new training method called textit- Kernel-wise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task.
arXiv Detail & Related papers (2020-09-11T21:48:39Z) - BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [103.74690082121079]
In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity.
Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches.
BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer.
arXiv Detail & Related papers (2020-01-02T03:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.