Alignment-Uniformity aware Representation Learning for Zero-shot Video
Classification
- URL: http://arxiv.org/abs/2203.15381v1
- Date: Tue, 29 Mar 2022 09:21:22 GMT
- Title: Alignment-Uniformity aware Representation Learning for Zero-shot Video
Classification
- Authors: Shi Pu and Kaili Zhao and Mao Zheng
- Abstract summary: This paper presents an end-to-end framework that preserves alignment and uniformity properties for representations on both seen and unseen classes.
Experiments show that our method significantly outperforms SoTA by relative improvements of 28.1% on UCF101 and 27.0% on HMDB51.
- Score: 3.6954802719347413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most methods tackle zero-shot video classification by aligning
visual-semantic representations within seen classes, which limits
generalization to unseen classes. To enhance model generalizability, this paper
presents an end-to-end framework that preserves alignment and uniformity
properties for representations on both seen and unseen classes. Specifically,
we formulate a supervised contrastive loss to simultaneously align
visual-semantic features (i.e., alignment) and encourage the learned features
to distribute uniformly (i.e., uniformity). Unlike existing methods that only
consider the alignment, we propose uniformity to preserve maximal-info of
existing features, which improves the probability that unobserved features fall
around observed data. Further, we synthesize features of unseen classes by
proposing a class generator that interpolates and extrapolates the features of
seen classes. Besides, we introduce two metrics, closeness and dispersion, to
quantify the two properties and serve as new measurements of model
generalizability. Experiments show that our method significantly outperforms
SoTA by relative improvements of 28.1% on UCF101 and 27.0% on HMDB51. Code is
available.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Uniformly Distributed Category Prototype-Guided Vision-Language
Framework for Long-Tail Recognition [11.110124286206467]
We propose a uniformly category prototype-guided vision-language framework to effectively mitigate feature space bias caused by data imbalance.
Our method outperforms previous vision-language methods for long-tailed learning work by a large margin and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-24T03:21:28Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z) - Concurrent Discrimination and Alignment for Self-Supervised Feature
Learning [52.213140525321165]
Existing self-supervised learning methods learn by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together.
In this work, we combine the positive aspects of the discriminating and aligning methods, and design a hybrid method that addresses the above issue.
Our method explicitly specifies the repulsion and attraction mechanism respectively by discriminative predictive task and concurrently maximizing mutual information between paired views.
Our experiments on nine established benchmarks show that the proposed model consistently outperforms the existing state-of-the-art results of self-supervised and transfer
arXiv Detail & Related papers (2021-08-19T09:07:41Z) - CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action
Recognition [52.66360172784038]
We propose a clustering-based model, which considers all training samples at once, instead of optimizing for each instance individually.
We call the proposed method CLASTER and observe that it consistently improves over the state-of-the-art in all standard datasets.
arXiv Detail & Related papers (2021-01-18T12:46:24Z) - Entropy-Based Uncertainty Calibration for Generalized Zero-Shot Learning [49.04790688256481]
The goal of generalized zero-shot learning (GZSL) is to recognise both seen and unseen classes.
Most GZSL methods typically learn to synthesise visual representations from semantic information on the unseen classes.
We propose a novel framework that leverages dual variational autoencoders with a triplet loss to learn discriminative latent features.
arXiv Detail & Related papers (2021-01-09T05:21:27Z) - Bidirectional Mapping Coupled GAN for Generalized Zero-Shot Learning [7.22073260315824]
Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data.
We learn a joint distribution of seen-unseen domains and preserving domain distinction is crucial for these methods.
In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling.
arXiv Detail & Related papers (2020-12-30T06:11:29Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.