Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks
- URL: http://arxiv.org/abs/2207.13423v1
- Date: Wed, 27 Jul 2022 10:04:23 GMT
- Title: Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks
- Authors: Yooshin Cho, Youngsoo Kim, Hanbyel Cho, Jaesung Ahn, Hyeong Gwon Hong,
Junmo Kim
- Abstract summary: Non-local (NL) block is a popular module that demonstrates the capability to model global contexts.
We empirically analyze if the magnitude and direction of input feature vectors properly affect the attention between vectors.
By replacing softmax operation with the scaling factor, we demonstrate improved performance on CIFAR-10, CIFAR-100, and Tiny-ImageNet.
- Score: 22.240253892754932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-local (NL) block is a popular module that demonstrates the capability to
model global contexts. However, NL block generally has heavy computation and
memory costs, so it is impractical to apply the block to high-resolution
feature maps. In this paper, to investigate the efficacy of NL block, we
empirically analyze if the magnitude and direction of input feature vectors
properly affect the attention between vectors. The results show the inefficacy
of softmax operation which is generally used to normalize the attention map of
the NL block. Attention maps normalized with softmax operation highly rely upon
magnitude of key vectors, and performance is degenerated if the magnitude
information is removed. By replacing softmax operation with the scaling factor,
we demonstrate improved performance on CIFAR-10, CIFAR-100, and Tiny-ImageNet.
In Addition, our method shows robustness to embedding channel reduction and
embedding weight initialization. Notably, our method makes multi-head attention
employable without additional computational cost.
Related papers
- Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention [19.618556742380086]
We present Lightning Attention, the first linear attention implementation that maintains a constant training speed for various sequence lengths under fixed memory consumption.
To enhance accuracy while preserving efficacy, we introduce TransNormerLLM (TNL), a new architecture that is tailored to our lightning attention.
arXiv Detail & Related papers (2024-05-27T17:38:13Z) - Compressing the Backward Pass of Large-Scale Neural Architectures by
Structured Activation Pruning [0.0]
Sparsity in Deep Neural Networks (DNNs) has gained attention as a solution.
This work focuses on ephemeral sparsity, aiming to reduce memory consumption during training.
We report the effectiveness of activation pruning by evaluating training speed, accuracy, and memory usage of large-scale neural architectures.
arXiv Detail & Related papers (2023-11-28T15:31:31Z) - Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation.
We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z) - Efficient Non-Local Contrastive Attention for Image Super-Resolution [48.093500219958834]
Non-Local Attention (NLA) brings significant improvement for Single Image Super-Resolution (SISR) by leveraging intrinsic feature correlation in natural images.
We propose a novel Efficient Non-Local Contrastive Attention (ENLCA) to perform long-range visual modeling and leverage more relevant non-local features.
arXiv Detail & Related papers (2022-01-11T05:59:09Z) - SiMaN: Sign-to-Magnitude Network Binarization [165.5630656849309]
We show that our weight binarization provides an analytical solution by encoding high-magnitude weights into +1s, and 0s otherwise.
We prove that the learned weights of binarized networks roughly follow a Laplacian distribution that does not allow entropy.
Our method, dubbed sign-to- neural network binarization (SiMaN), is evaluated on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2021-02-16T07:03:51Z) - LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks.
This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z) - Taming GANs with Lookahead-Minmax [63.90038365274479]
Experimental results on MNIST, SVHN, CIFAR-10, and ImageNet demonstrate a clear advantage of combining Lookahead-minmax with Adam or extragradient.
Using 30-fold fewer parameters and 16-fold smaller minibatches we outperform the reported performance of the class-dependent BigGAN on CIFAR-10 by obtaining FID of 12.19 without using the class labels.
arXiv Detail & Related papers (2020-06-25T17:13:23Z) - Neural Architecture Search for Lightweight Non-Local Networks [66.49621237326959]
Non-Local (NL) blocks have been widely studied in various vision tasks.
We propose a Lightweight Non-Local (LightNL) block by squeezing the transformation operations and incorporating compact features.
We also propose an efficient neural architecture search algorithm to learn an optimal configuration of LightNL blocks in an end-to-end manner.
arXiv Detail & Related papers (2020-04-04T15:46:39Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.