Compositional Few-Shot Recognition with Primitive Discovery and
Enhancing
- URL: http://arxiv.org/abs/2005.06047v3
- Date: Tue, 22 Sep 2020 00:03:47 GMT
- Title: Compositional Few-Shot Recognition with Primitive Discovery and
Enhancing
- Authors: Yixiong Zou, Shanghang Zhang, Ke Chen, Yonghong Tian, Yaowei Wang,
Jos\'e M. F. Moura
- Abstract summary: Few-shot learning aims at recognizing novel classes given only few training samples.
Humans can easily recognize novel classes with only few samples.
We propose an approach to learn a feature representation composed of important primitives.
- Score: 43.478770119996184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot learning (FSL) aims at recognizing novel classes given only few
training samples, which still remains a great challenge for deep learning.
However, humans can easily recognize novel classes with only few samples. A key
component of such ability is the compositional recognition that human can
perform, which has been well studied in cognitive science but is not well
explored in FSL. Inspired by such capability of humans, to imitate humans'
ability of learning visual primitives and composing primitives to recognize
novel classes, we propose an approach to FSL to learn a feature representation
composed of important primitives, which is jointly trained with two parts, i.e.
primitive discovery and primitive enhancing. In primitive discovery, we focus
on learning primitives related to object parts by self-supervision from the
order of image splits, avoiding extra laborious annotations and alleviating the
effect of semantic gaps. In primitive enhancing, inspired by current studies on
the interpretability of deep networks, we provide our composition view for the
FSL baseline model. To modify this model for effective composition, inspired by
both mathematical deduction and biological studies (the Hebbian Learning rule
and the Winner-Take-All mechanism), we propose a soft composition mechanism by
enlarging the activation of important primitives while reducing that of others,
so as to enhance the influence of important primitives and better utilize these
primitives to compose novel classes. Extensive experiments on public benchmarks
are conducted on both the few-shot image classification and video recognition
tasks. Our method achieves the state-of-the-art performance on all these
datasets and shows better interpretability.
Related papers
- CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning [52.63674911541416]
Few-shot class-incremental learning (FSCIL) faces several challenges, such as overfitting and forgetting.
Our primary focus is representation learning on base classes to tackle the unique challenge of FSCIL.
We find that trying to secure the spread of features within a more confined feature space enables the learned representation to strike a better balance between transferability and discriminability.
arXiv Detail & Related papers (2024-10-08T02:23:16Z) - Data-Free Class Incremental Gesture Recognition via Synthetic Feature Sampling [10.598646625077025]
DFCIL aims to enable models to continuously learn new classes while retraining knowledge of old classes, even when the training data for old classes is unavailable.
We developed Synthetic Feature Replay (SFR) that can sample synthetic features from class prototypes to replay for old classes and augment for new classes.
Our proposed method showcases significant advancements over the state-of-the-art, achieving up to 15% enhancements in mean accuracy across all steps.
arXiv Detail & Related papers (2024-08-21T18:44:15Z) - Compositional Few-Shot Class-Incremental Learning [23.720973742098682]
Few-shot class-incremental learning (FSCIL) is proposed to continually learn from novel classes with only a few samples.
In contrast, humans can easily recognize novel classes with a few samples.
Cognitive science demonstrates that an important component of such human capability is compositional learning.
arXiv Detail & Related papers (2024-05-27T10:21:38Z) - Compositional Learning in Transformer-Based Human-Object Interaction
Detection [6.630793383852106]
Long-tailed distribution of labeled instances is a primary challenge in HOI detection.
Inspired by the nature of HOI triplets, some existing approaches adopt the idea of compositional learning.
We creatively propose a transformer-based framework for compositional HOI learning.
arXiv Detail & Related papers (2023-08-11T06:41:20Z) - Visual-Semantic Contrastive Alignment for Few-Shot Image Classification [1.109560166867076]
Few-Shot learning aims to train a model that can adapt to unseen visual classes with only a few labeled examples.
We introduce a contrastive alignment mechanism for visual and semantic feature vectors to learn much more generalized visual concepts.
Our method simply adds an auxiliary contrastive learning objective which captures the contextual knowledge of a visual category.
arXiv Detail & Related papers (2022-10-20T03:59:40Z) - Learning Primitive-aware Discriminative Representations for Few-shot
Learning [28.17404445820028]
Few-shot learning aims to learn a classifier that can be easily adapted to recognize novel classes with only a few labeled examples.
We propose a Primitive Mining and Reasoning Network (PMRN) to learn primitive-aware representations.
Our method achieves state-of-the-art results on six standard benchmarks.
arXiv Detail & Related papers (2022-08-20T16:22:22Z) - CLAMP: Prompt-based Contrastive Learning for Connecting Language and
Animal Pose [70.59906971581192]
We introduce a novel prompt-based Contrastive learning scheme for connecting Language and AniMal Pose effectively.
The CLAMP attempts to bridge the gap by adapting the text prompts to the animal keypoints during network training.
Experimental results show that our method achieves state-of-the-art performance under the supervised, few-shot, and zero-shot settings.
arXiv Detail & Related papers (2022-06-23T14:51:42Z) - SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot
Learning [85.2093650907943]
We propose SEmantic Guided Attention (SEGA) to teach machines to recognize a new category.
SEGA uses semantic knowledge to guide the visual perception in a top-down manner about what visual features should be paid attention to.
We show that our semantic guided attention realizes anticipated function and outperforms state-of-the-art results.
arXiv Detail & Related papers (2021-11-08T08:03:44Z) - Partner-Assisted Learning for Few-Shot Image Classification [54.66864961784989]
Few-shot Learning has been studied to mimic human visual capabilities and learn effective models without the need of exhaustive human annotation.
In this paper, we focus on the design of training strategy to obtain an elemental representation such that the prototype of each novel class can be estimated from a few labeled samples.
We propose a two-stage training scheme, which first trains a partner encoder to model pair-wise similarities and extract features serving as soft-anchors, and then trains a main encoder by aligning its outputs with soft-anchors while attempting to maximize classification performance.
arXiv Detail & Related papers (2021-09-15T22:46:19Z) - Class-Balanced Distillation for Long-Tailed Visual Recognition [100.10293372607222]
Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions.
In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a long-tailed setting.
Our main contribution is a new training method, that leverages knowledge distillation to enhance feature representations.
arXiv Detail & Related papers (2021-04-12T08:21:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.