Learning a Condensed Frame for Memory-Efficient Video Class-Incremental
Learning
- URL: http://arxiv.org/abs/2211.00833v1
- Date: Wed, 2 Nov 2022 02:37:20 GMT
- Title: Learning a Condensed Frame for Memory-Efficient Video Class-Incremental
Learning
- Authors: Yixuan Pei, Zhiwu Qing, Jun Cen, Xiang Wang, Shiwei Zhang, Yaxiong
Wang, Mingqian Tang, Nong Sang, Xueming Qian
- Abstract summary: We propose FrameMaker, a memory-efficient video class-incremental learning approach.
We show that FrameMaker can achieve better performance to recent advanced methods while consuming only 20% memory.
Under the same memory consumption conditions, FrameMaker significantly outperforms existing state-of-the-arts by a convincing margin.
- Score: 41.514250287733354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent incremental learning for action recognition usually stores
representative videos to mitigate catastrophic forgetting. However, only a few
bulky videos can be stored due to the limited memory. To address this problem,
we propose FrameMaker, a memory-efficient video class-incremental learning
approach that learns to produce a condensed frame for each selected video.
Specifically, FrameMaker is mainly composed of two crucial components: Frame
Condensing and Instance-Specific Prompt. The former is to reduce the memory
cost by preserving only one condensed frame instead of the whole video, while
the latter aims to compensate the lost spatio-temporal details in the Frame
Condensing stage. By this means, FrameMaker enables a remarkable reduction in
memory but keep enough information that can be applied to following incremental
tasks. Experimental results on multiple challenging benchmarks, i.e., HMDB51,
UCF101 and Something-Something V2, demonstrate that FrameMaker can achieve
better performance to recent advanced methods while consuming only 20% memory.
Additionally, under the same memory consumption conditions, FrameMaker
significantly outperforms existing state-of-the-arts by a convincing margin.
Related papers
- Memory-Efficient Continual Learning Object Segmentation for Long Video [7.9190306016374485]
We propose two novel techniques to reduce the memory requirement of Online VOS methods while improving modeling accuracy and generalization on long videos.
Motivated by the success of continual learning techniques in preserving previously-learned knowledge, here we propose Gated-Regularizer Continual Learning (GRCL) and a Reconstruction-based Memory Selection Continual Learning (RMSCL)
Experimental results show that the proposed methods are able to improve the performance of Online VOS models by more than 8%, with improved robustness on long-video datasets.
arXiv Detail & Related papers (2023-09-26T21:22:03Z) - Just a Glimpse: Rethinking Temporal Information for Video Continual
Learning [58.7097258722291]
We propose a novel replay mechanism for effective video continual learning based on individual/single frames.
Under extreme memory constraints, video diversity plays a more significant role than temporal information.
Our method achieves state-of-the-art performance, outperforming the previous state-of-the-art by up to 21.49%.
arXiv Detail & Related papers (2023-05-28T19:14:25Z) - READMem: Robust Embedding Association for a Diverse Memory in
Unconstrained Video Object Segmentation [24.813416082160224]
We present READMem, a modular framework for sVOS methods to handle unconstrained videos.
We propose a robust association of the embeddings stored in the memory with query embeddings during the update process.
Our approach achieves competitive results on the Long-time Video dataset (LV1) while not hindering performance on short sequences.
arXiv Detail & Related papers (2023-05-22T08:31:16Z) - Per-Clip Video Object Segmentation [110.08925274049409]
Recently, memory-based approaches show promising results on semisupervised video object segmentation.
We treat video object segmentation as clip-wise mask-wise propagation.
We propose a new method tailored for the per-clip inference.
arXiv Detail & Related papers (2022-08-03T09:02:29Z) - Learning Quality-aware Dynamic Memory for Video Object Segmentation [32.06309833058726]
We propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame.
Our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks.
arXiv Detail & Related papers (2022-07-16T12:18:04Z) - Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size.
We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos.
We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z) - Memory-Augmented Non-Local Attention for Video Super-Resolution [61.55700315062226]
We propose a novel video super-resolution method that aims at generating high-fidelity high-resolution (HR) videos from low-resolution (LR) ones.
Previous methods predominantly leverage temporal neighbor frames to assist the super-resolution of the current frame.
In contrast, we devise a cross-frame non-local attention mechanism that allows video super-resolution without frame alignment.
arXiv Detail & Related papers (2021-08-25T05:12:14Z) - Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration.
This enables the learning of long-range dependencies beyond a single clip.
Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.