ResNeSt: Split-Attention Networks
- URL: http://arxiv.org/abs/2004.08955v2
- Date: Wed, 30 Dec 2020 05:05:15 GMT
- Title: ResNeSt: Split-Attention Networks
- Authors: Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi
Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola
- Abstract summary: We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations.
Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
- Score: 86.25490825631763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well known that featuremap attention and multi-path representation are
important for visual recognition. In this paper, we present a modularized
architecture, which applies the channel-wise attention on different network
branches to leverage their success in capturing cross-feature interactions and
learning diverse representations. Our design results in a simple and unified
computation block, which can be parameterized using only a few variables. Our
model, named ResNeSt, outperforms EfficientNet in accuracy and latency
trade-off on image classification. In addition, ResNeSt has achieved superior
transfer learning results on several public benchmarks serving as the backbone,
and has been adopted by the winning entries of COCO-LVIS challenge. The source
code for complete system and pretrained models are publicly available.
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification [7.643377057724898]
We present a unified framework called deep dependency networks (DDNs)
DDNs combine dependency networks and deep learning architectures for multi-label classification.
A drawback of DDNs compared to Markov networks is their lack of advanced inference schemes.
arXiv Detail & Related papers (2024-04-17T18:04:37Z) - Unveiling Backbone Effects in CLIP: Exploring Representational Synergies
and Variances [49.631908848868505]
Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning.
We investigate the differences in CLIP performance among various neural architectures.
We propose a simple, yet effective approach to combine predictions from multiple backbones, leading to a notable performance boost of up to 6.34%.
arXiv Detail & Related papers (2023-12-22T03:01:41Z) - Deep Dependency Networks for Multi-Label Classification [24.24496964886951]
We show that the performance of previous approaches that combine Markov Random Fields with neural networks can be modestly improved.
We propose a new modeling framework called deep dependency networks, which augments a dependency network.
Despite its simplicity, jointly learning this new architecture yields significant improvements in performance.
arXiv Detail & Related papers (2023-02-01T17:52:40Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - Embedded Self-Distillation in Compact Multi-Branch Ensemble Network for
Remote Sensing Scene Classification [17.321718779142817]
We propose a multi-branch ensemble network to enhance the feature representation ability.
We embed self-distillation (SD) method to transfer knowledge from ensemble network to main-branch in it.
Results prove that our proposed ESD-MBENet can achieve better accuracy than previous state-of-the-art (SOTA) complex models.
arXiv Detail & Related papers (2021-04-01T03:08:52Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Learning Deep Interleaved Networks with Asymmetric Co-Attention for
Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction.
In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies.
Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.