Spatial Bias for Attention-free Non-local Neural Networks
- URL: http://arxiv.org/abs/2302.12505v1
- Date: Fri, 24 Feb 2023 08:16:16 GMT
- Title: Spatial Bias for Attention-free Non-local Neural Networks
- Authors: Junhyung Go, Jongbin Ryu
- Abstract summary: We introduce the spatial bias to learn global knowledge without self-attention in convolutional neural networks.
We show that the spatial bias achieves competitive performance that improves the classification accuracy by +0.79% and +1.5% on ImageNet-1K and cifar100 datasets.
- Score: 11.320414512937946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce the spatial bias to learn global knowledge
without self-attention in convolutional neural networks. Owing to the limited
receptive field, conventional convolutional neural networks suffer from
learning long-range dependencies. Non-local neural networks have struggled to
learn global knowledge, but unavoidably have too heavy a network design due to
the self-attention operation. Therefore, we propose a fast and lightweight
spatial bias that efficiently encodes global knowledge without self-attention
on convolutional neural networks. Spatial bias is stacked on the feature map
and convolved together to adjust the spatial structure of the convolutional
features. Therefore, we learn the global knowledge on the convolution layer
directly with very few additional resources. Our method is very fast and
lightweight due to the attention-free non-local method while improving the
performance of neural networks considerably. Compared to non-local neural
networks, the spatial bias use about 10 times fewer parameters while achieving
comparable performance with 1.6 ~ 3.3 times more throughput on a very little
budget. Furthermore, the spatial bias can be used with conventional non-local
neural networks to further improve the performance of the backbone model. We
show that the spatial bias achieves competitive performance that improves the
classification accuracy by +0.79% and +1.5% on ImageNet-1K and cifar100
datasets. Additionally, we validate our method on the MS-COCO and ADE20K
datasets for downstream tasks involving object detection and semantic
segmentation.
Related papers
- Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation [36.41451383422967]
In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference.
Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity.
We propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems.
arXiv Detail & Related papers (2024-05-02T06:53:40Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - SAR Despeckling Using Overcomplete Convolutional Networks [53.99620005035804]
despeckling is an important problem in remote sensing as speckle degrades SAR images.
Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods.
This study employs an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field.
We show that the proposed network improves despeckling performance compared to recent despeckling methods on synthetic and real SAR images.
arXiv Detail & Related papers (2022-05-31T15:55:37Z) - HyBNN and FedHyBNN: (Federated) Hybrid Binary Neural Networks [0.0]
We introduce a novel hybrid neural network architecture, Hybrid Binary Neural Network (HyBNN)
HyBNN consists of a task-independent, general, full-precision variational autoencoder with a binary latent space and a task specific binary neural network.
We show that our proposed system is able to very significantly outperform a vanilla binary neural network with input binarization.
arXiv Detail & Related papers (2022-05-19T20:27:01Z) - CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded
Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor.
In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z) - Mining the Weights Knowledge for Optimizing Neural Network Structures [1.995792341399967]
We introduce a switcher neural network (SNN) that uses as inputs the weights of a task-specific neural network (called TNN for short)
By mining the knowledge contained in the weights, the SNN outputs scaling factors for turning off neurons in the TNN.
In terms of accuracy, we outperform baseline networks and other structure learning methods stably and significantly.
arXiv Detail & Related papers (2021-10-11T05:20:56Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Improving Neural Network with Uniform Sparse Connectivity [0.0]
We propose the novel uniform sparse network (USN) with even and sparse connectivity within each layer.
USN consistently and substantially outperforms the state-of-the-art sparse network models in prediction accuracy, speed and robustness.
arXiv Detail & Related papers (2020-11-29T19:00:05Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.