Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation
- URL: http://arxiv.org/abs/2007.04755v2
- Date: Wed, 13 Oct 2021 13:31:06 GMT
- Title: Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation
- Authors: Yongqin Xian, Bruno Korbar, Matthijs Douze, Lorenzo Torresani, Bernt
Schiele, Zeynep Akata
- Abstract summary: We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
- Score: 132.82884193921535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot learning aims to recognize novel classes from a few examples.
Although significant progress has been made in the image domain, few-shot video
classification is relatively unexplored. We argue that previous methods
underestimate the importance of video feature learning and propose to learn
spatiotemporal features using a 3D CNN. Proposing a two-stage approach that
learns video features on base classes followed by fine-tuning the classifiers
on novel classes, we show that this simple baseline approach outperforms prior
few-shot video classification methods by over 20 points on existing benchmarks.
To circumvent the need of labeled examples, we present two novel approaches
that yield further improvement. First, we leverage tag-labeled videos from a
large dataset using tag retrieval followed by selecting the best clips with
visual similarities. Second, we learn generative adversarial networks that
generate video features of novel classes from their semantic embeddings.
Moreover, we find existing benchmarks are limited because they only focus on 5
novel classes in each testing episode and introduce more realistic benchmarks
by involving more novel classes, i.e. few-shot learning, as well as a mixture
of novel and base classes, i.e. generalized few-shot learning. The experimental
results show that our retrieval and feature generation approach significantly
outperform the baseline approach on the new benchmarks.
Related papers
- Less than Few: Self-Shot Video Instance Segmentation [50.637278655763616]
We propose to automatically learn to find appropriate support videos given a query.
We tackle, for the first time, video instance segmentation in a self-shot (and few-shot) setting.
We provide strong baseline performances that utilize a novel transformer-based model.
arXiv Detail & Related papers (2022-04-19T13:14:43Z) - A Simple Approach to Adversarial Robustness in Few-shot Image
Classification [20.889464448762176]
We show that a simple transfer-learning based approach can be used to train adversarially robust few-shot classifiers.
We also present a method for novel classification task based on calibrating the centroid of the few-shot category towards the base classes.
arXiv Detail & Related papers (2022-04-11T22:46:41Z) - vCLIMB: A Novel Video Class Incremental Learning Benchmark [53.90485760679411]
We introduce vCLIMB, a novel video continual learning benchmark.
vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning.
We propose a temporal consistency regularization that can be applied on top of memory-based continual learning methods.
arXiv Detail & Related papers (2022-01-23T22:14:17Z) - A Closer Look at Few-Shot Video Classification: A New Baseline and
Benchmark [33.86872697028233]
We present an in-depth study on few-shot video classification by making three contributions.
First, we perform a consistent comparative study on the existing metric-based methods to figure out their limitations in representation learning.
Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting.
Third, we present a new benchmark with more base data to facilitate future few-shot video classification without pre-training.
arXiv Detail & Related papers (2021-10-24T06:01:46Z) - Cross-category Video Highlight Detection via Set-based Learning [55.49267044910344]
We propose a Dual-Learner-based Video Highlight Detection (DL-VHD) framework.
It learns the distinction of target category videos and the characteristics of highlight moments on source video category.
It outperforms five typical Unsupervised Domain Adaptation (UDA) algorithms on various cross-category highlight detection tasks.
arXiv Detail & Related papers (2021-08-26T13:06:47Z) - When Video Classification Meets Incremental Classes [12.322018693269952]
We propose a framework to address the challenge of textitcatastrophic forgetting forgetting.
To better it, we utilize some characteristics of videos. First, we alleviate the granularity-temporal knowledge before distillation.
Second, we propose a dual exemplar selection method to select and store representative video instances of old classes and key-frames inside videos under tight storage budget.
arXiv Detail & Related papers (2021-06-30T06:12:33Z) - TNT: Text-Conditioned Network with Transductive Inference for Few-Shot
Video Classification [26.12591949900602]
We formulate a text-based task conditioner to adapt video features to the few-shot learning task.
Our model obtains state-of-the-art performance on four challenging benchmarks in few-shot video action classification.
arXiv Detail & Related papers (2021-06-21T15:08:08Z) - Learning Implicit Temporal Alignment for Few-shot Video Classification [40.57508426481838]
Few-shot video classification aims to learn new video categories with only a few labeled examples.
It is particularly challenging to learn a class-invariant spatial-temporal representation in such a setting.
We propose a novel matching-based few-shot learning strategy for video sequences in this work.
arXiv Detail & Related papers (2021-05-11T07:18:57Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task.
Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.