Related papers: Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

URL: http://arxiv.org/abs/2211.00833v1
Date: Wed, 2 Nov 2022 02:37:20 GMT
Title: Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning
Authors: Yixuan Pei, Zhiwu Qing, Jun Cen, Xiang Wang, Shiwei Zhang, Yaxiong Wang, Mingqian Tang, Nong Sang, Xueming Qian
Abstract summary: We propose FrameMaker, a memory-efficient video class-incremental learning approach. We show that FrameMaker can achieve better performance to recent advanced methods while consuming only 20% memory. Under the same memory consumption conditions, FrameMaker significantly outperforms existing state-of-the-arts by a convincing margin.
Score: 41.514250287733354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent incremental learning for action recognition usually stores representative videos to mitigate catastrophic forgetting. However, only a few bulky videos can be stored due to the limited memory. To address this problem, we propose FrameMaker, a memory-efficient video class-incremental learning approach that learns to produce a condensed frame for each selected video. Specifically, FrameMaker is mainly composed of two crucial components: Frame Condensing and Instance-Specific Prompt. The former is to reduce the memory cost by preserving only one condensed frame instead of the whole video, while the latter aims to compensate the lost spatio-temporal details in the Frame Condensing stage. By this means, FrameMaker enables a remarkable reduction in memory but keep enough information that can be applied to following incremental tasks. Experimental results on multiple challenging benchmarks, i.e., HMDB51, UCF101 and Something-Something V2, demonstrate that FrameMaker can achieve better performance to recent advanced methods while consuming only 20% memory. Additionally, under the same memory consumption conditions, FrameMaker significantly outperforms existing state-of-the-arts by a convincing margin.

Related papers

Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory [63.32726513381937]
Current video editors struggle to maintain cross-consistency across sequential edits.<n>Memory-V2V is a framework that augments existing video-to-video models with explicit memory.<n>We show that Memory-V2V produces videos that are significantly more cross-consistent with minimal computational overhead.
arXiv Detail & Related papers (2026-01-22T19:59:17Z)
See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval [5.835635134105812]
We propose SMORE (See MORE, store less), a framework that enhances memory efficiency while maintaining high information resolution.<n>SMORE (1) uses query-guided captions to encode semantics aligned with user intent, (2) applies query-aware importance modulation to highlight relevant segments, and (3) adaptively compresses frames to preserve key content.<n> Experimental validation reveals that SMORE achieves state-of-the-art performance on QVHighlights, Charades-STA, and ActivityNet-Captions benchmarks.
arXiv Detail & Related papers (2026-01-14T10:28:11Z)
Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval [33.15952106579093]
We propose Context-as-Memory, which utilizes historical context as memory for video generation.<n>Considering the enormous computational overhead of incorporating all historical context, we propose the Memory Retrieval module.<n>Experiments demonstrate that Context-as-Memory achieves superior memory capabilities in interactive long video generation compared to SOTAs.
arXiv Detail & Related papers (2025-06-03T17:59:05Z)
Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing [52.050036778325094]
Video-Ma$2$mba is a novel architecture that incorporates State Space Models (SSMs) within the Mamba-2 framework. Our approach significantly reduces the memory footprint compared to standard gradient checkpointing. By maintaining a detailed capture of temporal dynamics, our model improves the accuracy and relevance of responses in long video understanding tasks.
arXiv Detail & Related papers (2024-11-29T04:12:13Z)
Memory-Efficient Continual Learning Object Segmentation for Long Video [7.9190306016374485]
We propose two novel techniques to reduce the memory requirement of Online VOS methods while improving modeling accuracy and generalization on long videos. Motivated by the success of continual learning techniques in preserving previously-learned knowledge, here we propose Gated-Regularizer Continual Learning (GRCL) and a Reconstruction-based Memory Selection Continual Learning (RMSCL) Experimental results show that the proposed methods are able to improve the performance of Online VOS models by more than 8%, with improved robustness on long-video datasets.
arXiv Detail & Related papers (2023-09-26T21:22:03Z)
Just a Glimpse: Rethinking Temporal Information for Video Continual Learning [58.7097258722291]
We propose a novel replay mechanism for effective video continual learning based on individual/single frames. Under extreme memory constraints, video diversity plays a more significant role than temporal information. Our method achieves state-of-the-art performance, outperforming the previous state-of-the-art by up to 21.49%.
arXiv Detail & Related papers (2023-05-28T19:14:25Z)
READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation [24.813416082160224]
We present READMem, a modular framework for sVOS methods to handle unconstrained videos. We propose a robust association of the embeddings stored in the memory with query embeddings during the update process. Our approach achieves competitive results on the Long-time Video dataset (LV1) while not hindering performance on short sequences.
arXiv Detail & Related papers (2023-05-22T08:31:16Z)
Per-Clip Video Object Segmentation [110.08925274049409]
Recently, memory-based approaches show promising results on semisupervised video object segmentation. We treat video object segmentation as clip-wise mask-wise propagation. We propose a new method tailored for the per-clip inference.
arXiv Detail & Related papers (2022-08-03T09:02:29Z)
Learning Quality-aware Dynamic Memory for Video Object Segmentation [32.06309833058726]
We propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame. Our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks.
arXiv Detail & Related papers (2022-07-16T12:18:04Z)
Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
Memory-Augmented Non-Local Attention for Video Super-Resolution [61.55700315062226]
We propose a novel video super-resolution method that aims at generating high-fidelity high-resolution (HR) videos from low-resolution (LR) ones. Previous methods predominantly leverage temporal neighbor frames to assist the super-resolution of the current frame. In contrast, we devise a cross-frame non-local attention mechanism that allows video super-resolution without frame alignment.
arXiv Detail & Related papers (2021-08-25T05:12:14Z)
Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration. This enables the learning of long-range dependencies beyond a single clip. Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.