Related papers: Studying inductive biases in image classification task

Studying inductive biases in image classification task

URL: http://arxiv.org/abs/2210.17141v1
Date: Mon, 31 Oct 2022 08:43:26 GMT
Title: Studying inductive biases in image classification task
Authors: Nana Arizumi
Abstract summary: Self-attention (SA) structures have locally independent filters and can use large kernels, which contradicts the previously popular convolutional neural networks (CNNs) We show that context awareness was the crucial property; however, large local information was not necessary to construct CA parameters.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, self-attention (SA) structures became popular in computer vision fields. They have locally independent filters and can use large kernels, which contradicts the previously popular convolutional neural networks (CNNs). CNNs success was attributed to the hard-coded inductive biases of locality and spatial invariance. However, recent studies have shown that inductive biases in CNNs are too restrictive. On the other hand, the relative position encodings, similar to depthwise (DW) convolution, are necessary for the local SA networks, which indicates that the SA structures are not entirely spatially variant. Hence, we would like to determine which part of inductive biases contributes to the success of the local SA structures. To do so, we introduced context-aware decomposed attention (CADA), which decomposes attention maps into multiple trainable base kernels and accumulates them using context-aware (CA) parameters. This way, we could identify the link between the CNNs and SA networks. We conducted ablation studies using the ResNet50 applied to the ImageNet classification task. DW convolution could have a large locality without increasing computational costs compared to CNNs, but the accuracy saturates with larger kernels. CADA follows this characteristic of locality. We showed that context awareness was the crucial property; however, large local information was not necessary to construct CA parameters. Even though no spatial invariance makes training difficult, more relaxed spatial invariance gave better accuracy than strict spatial invariance. Also, additional strong spatial invariance through relative position encoding was preferable. We extended these experiments to filters for downsampling and showed that locality bias is more critical for downsampling but can remove the strong locality bias using relaxed spatial invariance.

Related papers

Improved Generalization of Weight Space Networks via Augmentations [53.87011906358727]
Learning in deep weight spaces (DWS) is an emerging research direction, with applications to 2D and 3D neural fields (INRs, NeRFs) We empirically analyze the reasons for this overfitting and find that a key reason is the lack of diversity in DWS datasets. To address this, we explore strategies for data augmentation in weight spaces and propose a MixUp method adapted for weight spaces.
arXiv Detail & Related papers (2024-02-06T15:34:44Z)
CPR++: Object Localization via Single Coarse Point Supervision [55.8671776333499]
coarse point refinement (CPR) is first attempt to alleviate semantic variance from an algorithmic perspective. CPR reduces semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point. CPR++ can obtain scale information and further reduce the semantic variance in a global region.
arXiv Detail & Related papers (2024-01-30T17:38:48Z)
What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime. We prove that deep CNNs adapt to the spatial scale of the target function. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z)
Counting Varying Density Crowds Through Density Guided Adaptive Selection CNN and Transformer Estimation [25.050801798414263]
Human tend to locate and count the target in low-density regions, and reason the number in high-density regions. We propose a CNN and Transformer Adaptive Selection Network (CTASNet) which can adaptively select the appropriate counting branch for different density regions.
arXiv Detail & Related papers (2022-06-21T02:05:41Z)
Rethinking Spatial Invariance of Convolutional Networks for Object Counting [119.83017534355842]
We try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map. Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution. Our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects.
arXiv Detail & Related papers (2022-06-10T17:51:25Z)
SAR Despeckling Using Overcomplete Convolutional Networks [53.99620005035804]
despeckling is an important problem in remote sensing as speckle degrades SAR images. Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods. This study employs an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field. We show that the proposed network improves despeckling performance compared to recent despeckling methods on synthetic and real SAR images.
arXiv Detail & Related papers (2022-05-31T15:55:37Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
Localized convolutional neural networks for geospatial wind forecasting [0.0]
Convolutional Neural Networks (CNN) possess positive qualities when it comes to many spatial data. In this work, we propose localized convolutional neural networks that enable CNNs to learn local features in addition to the global ones. They can be added to any convolutional layers, easily end-to-end trained, introduce minimal additional complexity, and let CNNs retain most of their benefits to the extent that they are needed.
arXiv Detail & Related papers (2020-05-12T17:14:49Z)
Understanding when spatial transformer networks do not support invariance, and what to do about it [0.0]
spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. We show that STNs do not have the ability to align the feature maps of a transformed image with those of its original. We investigate alternative STN architectures that make use of complex features.
arXiv Detail & Related papers (2020-04-24T12:20:35Z)
Revisiting Saliency Metrics: Farthest-Neighbor Area Under Curve [23.334584322129142]
Saliency detection has been widely studied because it plays an important role in various vision applications. It is difficult to evaluate saliency systems because each measure has its own bias. We propose a new saliency metric based on the AUC property, which aims at sampling a more directional negative set for evaluation.
arXiv Detail & Related papers (2020-02-24T20:55:42Z)
Revisiting Spatial Invariance with Low-Rank Local Connectivity [33.430515807834254]
We show that relaxing spatial invariance improves classification accuracy over both convolution and locally connected layers. In experiments with small convolutional networks, we find that relaxing spatial invariance improves classification accuracy over both convolution and locally connected layers.
arXiv Detail & Related papers (2020-02-07T18:56:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.