When Video Classification Meets Incremental Classes
- URL: http://arxiv.org/abs/2106.15827v1
- Date: Wed, 30 Jun 2021 06:12:33 GMT
- Title: When Video Classification Meets Incremental Classes
- Authors: Hanbin Zhao, Xin Qin, Shihao Su, Zibo Lin, Xi Li
- Abstract summary: We propose a framework to address the challenge of textitcatastrophic forgetting forgetting.
To better it, we utilize some characteristics of videos. First, we alleviate the granularity-temporal knowledge before distillation.
Second, we propose a dual exemplar selection method to select and store representative video instances of old classes and key-frames inside videos under tight storage budget.
- Score: 12.322018693269952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of social media, tremendous videos with new
classes are generated daily, which raise an urgent demand for video
classification methods that can continuously update new classes while
maintaining the knowledge of old videos with limited storage and computing
resources. In this paper, we summarize this task as \textit{Class-Incremental
Video Classification (CIVC)} and propose a novel framework to address it. As a
subarea of incremental learning tasks, the challenge of \textit{catastrophic
forgetting} is unavoidable in CIVC. To better alleviate it, we utilize some
characteristics of videos. First, we decompose the spatio-temporal knowledge
before distillation rather than treating it as a whole in the knowledge
transfer process; trajectory is also used to refine the decomposition. Second,
we propose a dual granularity exemplar selection method to select and store
representative video instances of old classes and key-frames inside videos
under a tight storage budget. We benchmark our method and previous SOTA
class-incremental learning methods on Something-Something V2 and Kinetics
datasets, and our method outperforms previous methods significantly.
Related papers
- MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning
for Multimodal Video Captioning [10.95493493610559]
We propose a method to Mitigate Catastrophic Forgetting in class-incremental learning for multimodal Video Captioning (MCF-VC)
In order to better constrain the knowledge characteristics of old and new tasks at the specific feature level, we have created the Two-stage Knowledge Distillation (TsKD)
Our experiments on the public dataset MSR-VTT show that the proposed method significantly resists the forgetting of previous tasks without replaying old samples, and performs well on the new task.
arXiv Detail & Related papers (2024-02-27T16:54:08Z) - Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition [62.85802939587308]
This paper focuses on exploring Class Incremental Audio-Visual Video Recognition (CIAVVR)
Since both stored data and learned model of past classes contain historical knowledge, the core challenge is how to capture past data knowledge and past model knowledge to prevent catastrophic forgetting.
We introduce Hierarchical Augmentation and Distillation (HAD), which comprises the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM) to efficiently utilize the hierarchical structure of data and models.
arXiv Detail & Related papers (2024-01-11T23:00:24Z) - Just a Glimpse: Rethinking Temporal Information for Video Continual
Learning [58.7097258722291]
We propose a novel replay mechanism for effective video continual learning based on individual/single frames.
Under extreme memory constraints, video diversity plays a more significant role than temporal information.
Our method achieves state-of-the-art performance, outperforming the previous state-of-the-art by up to 21.49%.
arXiv Detail & Related papers (2023-05-28T19:14:25Z) - vCLIMB: A Novel Video Class Incremental Learning Benchmark [53.90485760679411]
We introduce vCLIMB, a novel video continual learning benchmark.
vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning.
We propose a temporal consistency regularization that can be applied on top of memory-based continual learning methods.
arXiv Detail & Related papers (2022-01-23T22:14:17Z) - TNT: Text-Conditioned Network with Transductive Inference for Few-Shot
Video Classification [26.12591949900602]
We formulate a text-based task conditioner to adapt video features to the few-shot learning task.
Our model obtains state-of-the-art performance on four challenging benchmarks in few-shot video action classification.
arXiv Detail & Related papers (2021-06-21T15:08:08Z) - Efficient training for future video generation based on hierarchical
disentangled representation of latent variables [66.94698064734372]
We propose a novel method for generating future prediction videos with less memory usage than the conventional methods.
We achieve high-efficiency by training our method in two stages: (1) image reconstruction to encode video frames into latent variables, and (2) latent variable prediction to generate the future sequence.
Our experiments show that the proposed method can efficiently generate future prediction videos, even for complex datasets that cannot be handled by previous methods.
arXiv Detail & Related papers (2021-06-07T10:43:23Z) - Learning Implicit Temporal Alignment for Few-shot Video Classification [40.57508426481838]
Few-shot video classification aims to learn new video categories with only a few labeled examples.
It is particularly challenging to learn a class-invariant spatial-temporal representation in such a setting.
We propose a novel matching-based few-shot learning strategy for video sequences in this work.
arXiv Detail & Related papers (2021-05-11T07:18:57Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.