Focus-Consistent Multi-Level Aggregation for Compositional Zero-Shot Learning
- URL: http://arxiv.org/abs/2408.17083v1
- Date: Fri, 30 Aug 2024 08:13:06 GMT
- Title: Focus-Consistent Multi-Level Aggregation for Compositional Zero-Shot Learning
- Authors: Fengyuan Dai, Siteng Huang, Min Zhang, Biao Gong, Donglin Wang,
- Abstract summary: We propose a novel method to generate personalized features for each branch based on the image content.
Our method incorporates a Multi-Level Feature Aggregation (MFA) module to generate personalized features for each branch based on the image content.
- Score: 34.133790456747626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To transfer knowledge from seen attribute-object compositions to recognize unseen ones, recent compositional zero-shot learning (CZSL) methods mainly discuss the optimal classification branches to identify the elements, leading to the popularity of employing a three-branch architecture. However, these methods mix up the underlying relationship among the branches, in the aspect of consistency and diversity. Specifically, consistently providing the highest-level features for all three branches increases the difficulty in distinguishing classes that are superficially similar. Furthermore, a single branch may focus on suboptimal regions when spatial messages are not shared between the personalized branches. Recognizing these issues and endeavoring to address them, we propose a novel method called Focus-Consistent Multi-Level Aggregation (FOMA). Our method incorporates a Multi-Level Feature Aggregation (MFA) module to generate personalized features for each branch based on the image content. Additionally, a Focus-Consistent Constraint encourages a consistent focus on the informative regions, thereby implicitly exchanging spatial information between all branches. Extensive experiments on three benchmark datasets (UT-Zappos, C-GQA, and Clothing16K) demonstrate that our FOMA outperforms SOTA.
Related papers
- Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - Integrative Few-Shot Learning for Classification and Segmentation [37.50821005917126]
We introduce the integrative task of few-shot classification and segmentation (FS-CS)
FS-CS aims to classify and segment target objects in a query image when the target classes are given with a few examples.
We propose the integrative few-shot learning framework for FS-CS, which trains a learner to construct class-wise foreground maps.
arXiv Detail & Related papers (2022-03-29T16:14:40Z) - SATS: Self-Attention Transfer for Continual Semantic Segmentation [50.51525791240729]
continual semantic segmentation suffers from the same catastrophic forgetting issue as in continual classification learning.
This study proposes to transfer a new type of information relevant to knowledge, i.e. the relationships between elements within each image.
The relationship information can be effectively obtained from the self-attention maps in a Transformer-style segmentation model.
arXiv Detail & Related papers (2022-03-15T06:09:28Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Foreground-Action Consistency Network for Weakly Supervised Temporal
Action Localization [66.66545680550782]
We present a framework named FAC-Net, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch.
First, our class-wise foreground classification branch regularizes the relation between actions and foreground to maximize the foreground-background separation.
Besides, the class-agnostic attention branch and multiple instance learning branch are adopted to regularize the foreground-action consistency and help to learn a meaningful foreground.
arXiv Detail & Related papers (2021-08-14T12:34:44Z) - Contrastive Multi-Modal Clustering [22.117014300127423]
We propose Contrastive Multi-Modal Clustering (CMMC) which can mine high-level semantic information via contrastive learning.
CMMC has good scalability and outperforms state-of-the-art multi-modal clustering methods.
arXiv Detail & Related papers (2021-06-21T15:32:34Z) - Generative Multi-Label Zero-Shot Learning [136.17594611722285]
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training.
Our work is the first to tackle the problem of multi-label feature in the (generalized) zero-shot setting.
Our cross-level fusion-based generative approach outperforms the state-of-the-art on all three datasets.
arXiv Detail & Related papers (2021-01-27T18:56:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.