Related papers: Spatial Bias for Attention-free Non-local Neural Networks

Spatial Bias for Attention-free Non-local Neural Networks

URL: http://arxiv.org/abs/2302.12505v1
Date: Fri, 24 Feb 2023 08:16:16 GMT
Title: Spatial Bias for Attention-free Non-local Neural Networks
Authors: Junhyung Go, Jongbin Ryu
Abstract summary: We introduce the spatial bias to learn global knowledge without self-attention in convolutional neural networks. We show that the spatial bias achieves competitive performance that improves the classification accuracy by +0.79% and +1.5% on ImageNet-1K and cifar100 datasets.
Score: 11.320414512937946
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we introduce the spatial bias to learn global knowledge without self-attention in convolutional neural networks. Owing to the limited receptive field, conventional convolutional neural networks suffer from learning long-range dependencies. Non-local neural networks have struggled to learn global knowledge, but unavoidably have too heavy a network design due to the self-attention operation. Therefore, we propose a fast and lightweight spatial bias that efficiently encodes global knowledge without self-attention on convolutional neural networks. Spatial bias is stacked on the feature map and convolved together to adjust the spatial structure of the convolutional features. Therefore, we learn the global knowledge on the convolution layer directly with very few additional resources. Our method is very fast and lightweight due to the attention-free non-local method while improving the performance of neural networks considerably. Compared to non-local neural networks, the spatial bias use about 10 times fewer parameters while achieving comparable performance with 1.6 ~ 3.3 times more throughput on a very little budget. Furthermore, the spatial bias can be used with conventional non-local neural networks to further improve the performance of the backbone model. We show that the spatial bias achieves competitive performance that improves the classification accuracy by +0.79% and +1.5% on ImageNet-1K and cifar100 datasets. Additionally, we validate our method on the MS-COCO and ADE20K datasets for downstream tasks involving object detection and semantic segmentation.

Related papers

Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation [36.41451383422967]
In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference. Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity. We propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems.
arXiv Detail & Related papers (2024-05-02T06:53:40Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
SAR Despeckling Using Overcomplete Convolutional Networks [53.99620005035804]
despeckling is an important problem in remote sensing as speckle degrades SAR images. Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods. This study employs an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field. We show that the proposed network improves despeckling performance compared to recent despeckling methods on synthetic and real SAR images.
arXiv Detail & Related papers (2022-05-31T15:55:37Z)
HyBNN and FedHyBNN: (Federated) Hybrid Binary Neural Networks [0.0]
We introduce a novel hybrid neural network architecture, Hybrid Binary Neural Network (HyBNN) HyBNN consists of a task-independent, general, full-precision variational autoencoder with a binary latent space and a task specific binary neural network. We show that our proposed system is able to very significantly outperform a vanilla binary neural network with input binarization.
arXiv Detail & Related papers (2022-05-19T20:27:01Z)
CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor. In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z)
Mining the Weights Knowledge for Optimizing Neural Network Structures [1.995792341399967]
We introduce a switcher neural network (SNN) that uses as inputs the weights of a task-specific neural network (called TNN for short) By mining the knowledge contained in the weights, the SNN outputs scaling factors for turning off neurons in the TNN. In terms of accuracy, we outperform baseline networks and other structure learning methods stably and significantly.
arXiv Detail & Related papers (2021-10-11T05:20:56Z)
Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience. We show that sparse coding can effectively maximize the entropy of the output signals. Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z)
Improving Neural Network with Uniform Sparse Connectivity [0.0]
We propose the novel uniform sparse network (USN) with even and sparse connectivity within each layer. USN consistently and substantially outperforms the state-of-the-art sparse network models in prediction accuracy, speed and robustness.
arXiv Detail & Related papers (2020-11-29T19:00:05Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.