Learning Dynamic Alignment via Meta-filter for Few-shot Learning
- URL: http://arxiv.org/abs/2103.13582v1
- Date: Thu, 25 Mar 2021 03:29:33 GMT
- Title: Learning Dynamic Alignment via Meta-filter for Few-shot Learning
- Authors: Chengming Xu, Chen Liu, Li Zhang, Chengjie Wang, Jilin Li, Feiyue
Huang, Xiangyang Xue, Yanwei Fu
- Abstract summary: Few-shot learning aims to recognise new classes by adapting the learned knowledge with extremely limited few-shot (support) examples.
We learn a dynamic alignment, which can effectively highlight both query regions and channels according to different local support information.
The resulting framework establishes the new state-of-the-arts on major few-shot visual recognition benchmarks.
- Score: 94.41887992982986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot learning (FSL), which aims to recognise new classes by adapting the
learned knowledge with extremely limited few-shot (support) examples, remains
an important open problem in computer vision. Most of the existing methods for
feature alignment in few-shot learning only consider image-level or
spatial-level alignment while omitting the channel disparity. Our insight is
that these methods would lead to poor adaptation with redundant matching, and
leveraging channel-wise adjustment is the key to well adapting the learned
knowledge to new classes. Therefore, in this paper, we propose to learn a
dynamic alignment, which can effectively highlight both query regions and
channels according to different local support information. Specifically, this
is achieved by first dynamically sampling the neighbourhood of the feature
position conditioned on the input few shot, based on which we further predict a
both position-dependent and channel-dependent Dynamic Meta-filter. The filter
is used to align the query feature with position-specific and channel-specific
knowledge. Moreover, we adopt Neural Ordinary Differential Equation (ODE) to
enable a more accurate control of the alignment. In such a sense our model is
able to better capture fine-grained semantic context of the few-shot example
and thus facilitates dynamical knowledge adaptation for few-shot learning. The
resulting framework establishes the new state-of-the-arts on major few-shot
visual recognition benchmarks, including miniImageNet and tieredImageNet.
Related papers
- Context-Based Visual-Language Place Recognition [4.737519767218666]
A popular approach to vision-based place recognition relies on low-level visual features.
We introduce a novel VPR approach that remains robust to scene changes and does not require additional training.
Our method constructs semantic image descriptors by extracting pixel-level embeddings using a zero-shot, language-driven semantic segmentation model.
arXiv Detail & Related papers (2024-10-25T06:59:11Z) - Locality Alignment Improves Vision-Language Models [55.275235524659905]
Vision language models (VLMs) have seen growing adoption in recent years, but many still struggle with basic spatial reasoning errors.
We propose a new efficient post-training stage for ViTs called locality alignment.
We show that locality-aligned backbones improve performance across a range of benchmarks.
arXiv Detail & Related papers (2024-10-14T21:01:01Z) - Siamese Transformer Networks for Few-shot Image Classification [9.55588609556447]
Humans exhibit remarkable proficiency in visual classification tasks, accurately recognizing and classifying new images with minimal examples.
Existing few-shot image classification methods often emphasize either global features or local features, with few studies considering the integration of both.
We propose a novel approach based on the Siamese Transformer Network (STN)
Our strategy effectively harnesses the potential of global and local features in few-shot image classification, circumventing the need for complex feature adaptation modules.
arXiv Detail & Related papers (2024-07-16T14:27:23Z) - Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning [56.29097276129473]
We propose a simple yet effective framework, named Learning Prompt with Distribution-based Feature Replay (LP-DiF)
To prevent the learnable prompt from forgetting old knowledge in the new session, we propose a pseudo-feature replay approach.
When progressing to a new session, pseudo-features are sampled from old-class distributions combined with training images of the current session to optimize the prompt.
arXiv Detail & Related papers (2024-01-03T07:59:17Z) - Weakly-supervised Representation Learning for Video Alignment and
Analysis [16.80278496414627]
This paper introduces LRProp -- a novel weakly-supervised representation learning approach.
The proposed algorithm uses also a regularized SoftDTW loss for better tuning the learned features.
Our novel representation learning paradigm consistently outperforms the state of the art on temporal alignment tasks.
arXiv Detail & Related papers (2023-02-08T14:01:01Z) - Learning to Affiliate: Mutual Centralized Learning for Few-shot
Classification [33.19451499073551]
Few-shot learning aims to learn a classifier that can be easily adapted to accommodate new tasks not seen during training.
Recent methods tend to collectively use a set of local features to densely represent an image instead of using a mixed global feature.
arXiv Detail & Related papers (2021-06-10T06:16:00Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.