Multi-level Second-order Few-shot Learning
- URL: http://arxiv.org/abs/2201.05916v1
- Date: Sat, 15 Jan 2022 19:49:00 GMT
- Title: Multi-level Second-order Few-shot Learning
- Authors: Hongguang Zhang, Hongdong Li, Piotr Koniusz
- Abstract summary: We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
- Score: 111.0648869396828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a Multi-level Second-order (MlSo) few-shot learning network for
supervised or unsupervised few-shot image classification and few-shot action
recognition. We leverage so-called power-normalized second-order base learner
streams combined with features that express multiple levels of visual
abstraction, and we use self-supervised discriminating mechanisms. As
Second-order Pooling (SoP) is popular in image recognition, we employ its basic
element-wise variant in our pipeline. The goal of multi-level feature design is
to extract feature representations at different layer-wise levels of CNN,
realizing several levels of visual abstraction to achieve robust few-shot
learning. As SoP can handle convolutional feature maps of varying spatial
sizes, we also introduce image inputs at multiple spatial scales into MlSo. To
exploit the discriminative information from multi-level and multi-scale
features, we develop a Feature Matching (FM) module that reweights their
respective branches. We also introduce a self-supervised step, which is a
discriminator of the spatial level and the scale of abstraction. Our pipeline
is trained in an end-to-end manner. With a simple architecture, we demonstrate
respectable results on standard datasets such as Omniglot, mini-ImageNet,
tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford
Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and
mini-MIT.
Related papers
- HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification [15.129037250680582]
Tight visual-linguistic interactions play a vital role in improving classification performance.
Recent Transformer-based methods have achieved great success in multi-label image classification.
We propose a Hierarchical Scale-Aware Vision-Language Transformer (HSVLT) with two appealing designs.
arXiv Detail & Related papers (2024-07-23T07:31:42Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical
Image Segmentation [73.10707675345253]
We propose a general multi-scale in multi-scale subtraction network (M$2$SNet) to finish diverse segmentation from medical image.
Our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:26:49Z) - Deep Image Clustering with Contrastive Learning and Multi-scale Graph
Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN)
Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z) - Diverse Instance Discovery: Vision-Transformer for Instance-Aware
Multi-Label Image Recognition [24.406654146411682]
Vision Transformer (ViT) is the research base for this paper.
Our goal is to leverage ViT's patch tokens and self-attention mechanism to mine rich instances in multi-label images.
We propose a weakly supervised object localization-based approach to extract multi-scale local features.
arXiv Detail & Related papers (2022-04-22T14:38:40Z) - Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z) - Multiscale Deep Equilibrium Models [162.15362280927476]
We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ)
An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously.
We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset.
arXiv Detail & Related papers (2020-06-15T18:07:44Z) - ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations.
Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.