Block Walsh-Hadamard Transform Based Binary Layers in Deep Neural
Networks
- URL: http://arxiv.org/abs/2201.02711v1
- Date: Fri, 7 Jan 2022 23:52:41 GMT
- Title: Block Walsh-Hadamard Transform Based Binary Layers in Deep Neural
Networks
- Authors: Hongyi Pan, Diaa Badawi, Ahmet Enis Cetin
- Abstract summary: Convolution has been the core operation of modern deep neural networks.
We propose to use binary block Walsh-Hadamard transform (WHT) instead of the Fourier transform.
We use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks.
- Score: 7.906608953906891
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolution has been the core operation of modern deep neural networks. It is
well-known that convolutions can be implemented in the Fourier Transform
domain. In this paper, we propose to use binary block Walsh-Hadamard transform
(WHT) instead of the Fourier transform. We use WHT-based binary layers to
replace some of the regular convolution layers in deep neural networks. We
utilize both one-dimensional (1-D) and two-dimensional (2-D) binary WHTs in
this paper. In both 1-D and 2-D layers, we compute the binary WHT of the input
feature map and denoise the WHT domain coefficients using a nonlinearity which
is obtained by combining soft-thresholding with the tanh function. After
denoising, we compute the inverse WHT. We use 1D-WHT to replace the $1\times 1$
convolutional layers, and 2D-WHT layers can replace the 3$\times$3 convolution
layers and Squeeze-and-Excite layers. 2D-WHT layers with trainable weights can
be also inserted before the Global Average Pooling (GAP) layers to assist the
dense layers. In this way, we can reduce the number of trainable parameters
significantly with a slight decrease in trainable parameters. In this paper, we
implement the WHT layers into MobileNet-V2, MobileNet-V3-Large, and ResNet to
reduce the number of parameters significantly with negligible accuracy loss.
Moreover, according to our speed test, the 2D-FWHT layer runs about 24 times as
fast as the regular $3\times 3$ convolution with 19.51\% less RAM usage in an
NVIDIA Jetson Nano experiment.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Kolmogorov-Arnold Transformer [72.88137795439407]
We introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces layers with Kolmogorov-Arnold Network (KAN) layers.
We identify three key challenges: (C1) Base function, (C2) Inefficiency, and (C3) Weight.
With these designs, KAT outperforms traditional-based transformers.
arXiv Detail & Related papers (2024-09-16T17:54:51Z) - A Hybrid Quantum-Classical Approach based on the Hadamard Transform for
the Convolutional Layer [3.316567107326828]
We propose a novel Hadamard Transform-based neural network layer for hybrid quantum-classical computing.
The idea is based on the HT convolution theorem which states that the dyadic convolution between two vectors is equivalent to the element-wise multiplication of their HT representation.
arXiv Detail & Related papers (2023-05-27T16:11:48Z) - Multichannel Orthogonal Transform-Based Perceptron Layers for Efficient ResNets [2.829818195105779]
We propose a set of transform-based neural network layers as an alternative to the $3times3$ Conv2D layers in CNNs.
The proposed layers can be implemented based on transforms such as the Discrete Cosine Transform (DCT), Hadamard transform (HT), and biorthogonal Block Wavelet Transform (BWT)
arXiv Detail & Related papers (2023-03-13T01:07:32Z) - DCT Perceptron Layer: A Transform Domain Approach for Convolution Layer [3.506018346865459]
We propose a novel Discrete Cosine Transform (DCT)-based neural network layer which we call DCT-perceptron.
Convolutional filtering operations are performed in the DCT domain using element-wise multiplications.
The DCT-perceptron layer reduces the number of parameters and multiplications significantly.
arXiv Detail & Related papers (2022-11-15T23:44:56Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT)
HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach.
Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z) - Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters.
It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions.
We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z) - Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary
Layers in Deep Neural Networks [0.0]
We propose a layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace $1times 1$ convolution layers in deep neural networks.
Using these two types of layers, we replace the bottleneck layers in MobileNet-V2 to reduce the network's number of parameters with a slight loss in accuracy.
arXiv Detail & Related papers (2021-04-14T19:23:36Z) - DO-Conv: Depthwise Over-parameterized Convolutional Layer [66.46704754669169]
We propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel.
We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs.
arXiv Detail & Related papers (2020-06-22T06:57:10Z) - Depthwise-STFT based separable Convolutional Neural Networks [35.636461829966095]
We propose a new convolutional layer called Depthwise-STFT Separable layer.
It can serve as an alternative to the standard depthwise separable convolutional layer.
We show that the proposed layer outperforms the standard depthwise separable layer-based models on the CIFAR-10 and CIFAR-100 image classification datasets.
arXiv Detail & Related papers (2020-01-27T17:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.