Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary
Layers in Deep Neural Networks
- URL: http://arxiv.org/abs/2104.07085v1
- Date: Wed, 14 Apr 2021 19:23:36 GMT
- Title: Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary
Layers in Deep Neural Networks
- Authors: Hongyi Pan, Diaa Dabawi and Ahmet Enis Cetin
- Abstract summary: We propose a layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace $1times 1$ convolution layers in deep neural networks.
Using these two types of layers, we replace the bottleneck layers in MobileNet-V2 to reduce the network's number of parameters with a slight loss in accuracy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel layer based on fast Walsh-Hadamard
transform (WHT) and smooth-thresholding to replace $1\times 1$ convolution
layers in deep neural networks. In the WHT domain, we denoise the transform
domain coefficients using the new smooth-thresholding non-linearity, a smoothed
version of the well-known soft-thresholding operator. We also introduce a
family of multiplication-free operators from the basic 2$\times$2 Hadamard
transform to implement $3\times 3$ depthwise separable convolution layers.
Using these two types of layers, we replace the bottleneck layers in
MobileNet-V2 to reduce the network's number of parameters with a slight loss in
accuracy. For example, by replacing the final third bottleneck layers, we
reduce the number of parameters from 2.270M to 947K. This reduces the accuracy
from 95.21\% to 92.88\% on the CIFAR-10 dataset. Our approach significantly
improves the speed of data processing. The fast Walsh-Hadamard transform has a
computational complexity of $O(m\log_2 m)$. As a result, it is computationally
more efficient than the $1\times1$ convolution layer. The fast Walsh-Hadamard
layer processes a tensor in $\mathbb{R}^{10\times32\times32\times1024}$ about 2
times faster than $1\times1$ convolution layer on NVIDIA Jetson Nano computer
board.
Related papers
- ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration [8.829482765731022]
$N:M$ sparsity is an emerging model compression method supported by more and more accelerators.
We propose ELSA, Exploiting Layer-wise $N:M$ Sparsity for ViTs.
arXiv Detail & Related papers (2024-09-15T12:14:24Z) - Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters.
We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers.
Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z) - Kronecker-Factored Approximate Curvature for Modern Neural Network
Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC)
We show they are exact for deep linear networks with weight-sharing in their respective setting.
We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z) - Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic
Programming [15.458305667190256]
We propose a novel depth compression algorithm which targets general convolution operations.
We achieve $1.41times$ speed-up with $0.11%p accuracy gain in MobileNetV2-1.0 on the ImageNet.
arXiv Detail & Related papers (2023-01-28T13:08:54Z) - Block Walsh-Hadamard Transform Based Binary Layers in Deep Neural
Networks [7.906608953906891]
Convolution has been the core operation of modern deep neural networks.
We propose to use binary block Walsh-Hadamard transform (WHT) instead of the Fourier transform.
We use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks.
arXiv Detail & Related papers (2022-01-07T23:52:41Z) - Spike time displacement based error backpropagation in convolutional
spiking neural networks [0.6193838300896449]
In this paper, we extend the STiDi-BP algorithm to employ it in deeper and convolutional architectures.
The evaluation results on the image classification task based on two popular benchmarks, MNIST and Fashion-MNIST, confirm that this algorithm has been applicable in deep SNNs.
We consider a convolutional SNN with two sets of weights: real-valued weights that are updated in the backward pass and their signs, binary weights, that are employed in the feedforward process.
arXiv Detail & Related papers (2021-08-31T05:18:59Z) - HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT)
HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach.
Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z) - 1$\times$N Block Pattern for Network Sparsity [90.43191747596491]
We propose one novel concept of $1times N$ block sparsity pattern (block pruning) to break this limitation.
Our pattern obtains about 3.0% improvements over filter pruning in the top-1 accuracy of MobileNet-V2.
It also obtains 56.04ms inference savings on Cortex-A7 CPU over weight pruning.
arXiv Detail & Related papers (2021-05-31T05:50:33Z) - Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters.
It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions.
We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z) - DO-Conv: Depthwise Over-parameterized Convolutional Layer [66.46704754669169]
We propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel.
We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs.
arXiv Detail & Related papers (2020-06-22T06:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.