Rethinking Generative Zero-Shot Learning: An Ensemble Learning
Perspective for Recognising Visual Patches
- URL: http://arxiv.org/abs/2007.13314v3
- Date: Fri, 7 Aug 2020 01:14:32 GMT
- Title: Rethinking Generative Zero-Shot Learning: An Ensemble Learning
Perspective for Recognising Visual Patches
- Authors: Zhi Chen, Sen Wang, Jingjing Li, Zi Huang
- Abstract summary: We propose a novel framework called multi-patch generative adversarial nets (MPGAN)
MPGAN synthesises local patch features and labels unseen classes with a novel weighted voting strategy.
MPGAN has significantly greater accuracy than state-of-the-art methods.
- Score: 52.67723703088284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot learning (ZSL) is commonly used to address the very pervasive
problem of predicting unseen classes in fine-grained image classification and
other tasks. One family of solutions is to learn synthesised unseen visual
samples produced by generative models from auxiliary semantic information, such
as natural language descriptions. However, for most of these models,
performance suffers from noise in the form of irrelevant image backgrounds.
Further, most methods do not allocate a calculated weight to each semantic
patch. Yet, in the real world, the discriminative power of features can be
quantified and directly leveraged to improve accuracy and reduce computational
complexity. To address these issues, we propose a novel framework called
multi-patch generative adversarial nets (MPGAN) that synthesises local patch
features and labels unseen classes with a novel weighted voting strategy. The
process begins by generating discriminative visual features from noisy text
descriptions for a set of predefined local patches using multiple specialist
generative models. The features synthesised from each patch for unseen classes
are then used to construct an ensemble of diverse supervised classifiers, each
corresponding to one local patch. A voting strategy averages the probability
distributions output from the classifiers and, given that some patches are more
discriminative than others, a discrimination-based attention mechanism helps to
weight each patch accordingly. Extensive experiments show that MPGAN has
significantly greater accuracy than state-of-the-art methods.
Related papers
- Accurate Explanation Model for Image Classifiers using Class Association Embedding [5.378105759529487]
We propose a generative explanation model that combines the advantages of global and local knowledge.
Class association embedding (CAE) encodes each sample into a pair of separated class-associated and individual codes.
Building-block coherency feature extraction algorithm is proposed that efficiently separates class-associated features from individual ones.
arXiv Detail & Related papers (2024-06-12T07:41:00Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Simplified Concrete Dropout -- Improving the Generation of Attribution
Masks for Fine-grained Classification [8.330791157878137]
Fine-grained classification models are often deployed to determine animal species or individuals in automated animal monitoring systems.
Attention- or gradient-based methods are commonly used to identify regions in the image that contribute the most to the classification decision.
This paper presents a solution to circumvent these computational instabilities by simplifying the CD sampling and reducing reliance on large mini-batch sizes.
arXiv Detail & Related papers (2023-07-27T13:01:49Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - PatchMix Augmentation to Identify Causal Features in Few-shot Learning [55.64873998196191]
Few-shot learning aims to transfer knowledge learned from base with sufficient categories labelled data to novel categories with scarce known information.
We propose a novel data augmentation strategy dubbed as PatchMix that can break this spurious dependency.
We show that such an augmentation mechanism, different from existing ones, is able to identify the causal features.
arXiv Detail & Related papers (2022-11-29T08:41:29Z) - Text2Model: Text-based Model Induction for Zero-shot Image Classification [38.704831945753284]
We address the challenge of building task-agnostic classifiers using only text descriptions.
We generate zero-shot classifiers using a hypernetwork that receives class descriptions and outputs a multi-class model.
We evaluate this approach in a series of zero-shot classification tasks, for image, point-cloud, and action recognition, using a range of text descriptions.
arXiv Detail & Related papers (2022-10-27T05:19:55Z) - Weakly Supervised Semantic Segmentation via Progressive Patch Learning [39.87150496277798]
"Progressive Patch Learning" approach is proposed to improve the local details extraction of the classification.
"Patch Learning" destructs the feature maps into patches and independently processes each local patch in parallel before the final aggregation.
"Progressive Patch Learning" further extends the feature destruction and patch learning to multi-level granularities in a progressive manner.
arXiv Detail & Related papers (2022-09-16T09:54:17Z) - Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and
Semantic Attention [14.855116554722489]
Multi-label zero-shot learning aims at recognizing multiple unseen labels of classes for each input sample.
We propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions.
arXiv Detail & Related papers (2022-03-07T15:52:46Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Information Bottleneck Constrained Latent Bidirectional Embedding for
Zero-Shot Learning [59.58381904522967]
We propose a novel embedding based generative model with a tight visual-semantic coupling constraint.
We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces.
Our method can be easily extended to transductive ZSL setting by generating labels for unseen images.
arXiv Detail & Related papers (2020-09-16T03:54:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.