Related papers: Unsupervised Video Class-Incremental Learning via Deep Embedded Clustering Management

Unsupervised Video Class-Incremental Learning via Deep Embedded Clustering Management

URL: http://arxiv.org/abs/2601.14069v1
Date: Tue, 20 Jan 2026 15:25:41 GMT
Title: Unsupervised Video Class-Incremental Learning via Deep Embedded Clustering Management
Authors: Nattapong Kurpukdee, Adrian G. Bors,
Abstract summary: Unsupervised video class incremental learning (uVCIL) represents an important learning paradigm for learning video information without forgetting.<n>We propose a simple yet effective approach to address the uVCIL.<n>We first consider a deep feature extractor network, providing a set of representative video features during each task without assuming any class or task information.
Score: 47.53991869205973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unsupervised video class incremental learning (uVCIL) represents an important learning paradigm for learning video information without forgetting, and without considering any data labels. Prior approaches have focused on supervised class-incremental learning, relying on using the knowledge of labels and task boundaries, which is costly, requires human annotation, or is simply not a realistic option. In this paper, we propose a simple yet effective approach to address the uVCIL. We first consider a deep feature extractor network, providing a set of representative video features during each task without assuming any class or task information. We then progressively build a series of deep clusters from the extracted features. During the successive task learning, the model updated from the previous task is used as an initial state in order to transfer knowledge to the current learning task. We perform in-depth evaluations on three standard video action recognition datasets, including UCF101, HMDB51, and Something-to-Something V2, by ignoring the labels from the supervised setting. Our approach significantly outperforms other baselines on all datasets.

Related papers

Dual Learning with Dynamic Knowledge Distillation and Soft Alignment for Partially Relevant Video Retrieval [53.54695034420311]
In practice, videos are typically untrimmed in long durations with much more complicated background content.<n>We propose a novel framework that distills generalization knowledge from a powerful large-scale vision-language pre-trained model.<n>Experiment results demonstrate that our proposed model achieves state-of-the-art performance on TVR, ActivityNet, and Charades-STA datasets.
arXiv Detail & Related papers (2025-10-14T08:38:20Z)
Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering [47.53991869205973]
We propose a realistic scenario for the unsupervised video learning where neither task boundaries nor labels are provided when learning a succession of tasks.<n>We also provide a non-parametric learning solution for the under-explored problem of unsupervised video continual learning.
arXiv Detail & Related papers (2025-08-29T16:49:03Z)
Enhancing Multi-Modal Video Sentiment Classification Through Semi-Supervised Clustering [0.0]
We aim to improve video sentiment classification by focusing on two key aspects: the video itself, the accompanying text, and the acoustic features.<n>We are developing a method that utilizes clustering-based semi-supervised pre-training to extract meaningful representations from the data.
arXiv Detail & Related papers (2025-01-11T08:04:39Z)
Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame. ATM outperforms strong video pre-training baselines by 80% on average. We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z)
Weakly Supervised Video Individual CountingWeakly Supervised Video Individual Counting [126.75545291243142]
Video Individual Counting aims to predict the number of unique individuals in a single video. We introduce a weakly supervised VIC task, wherein trajectory labels are not provided. In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining.
arXiv Detail & Related papers (2023-12-10T16:12:13Z)
CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples. Existing methods in video action recognition rely on large labeled datasets from the same domain. We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z)
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language [38.02396786726476]
We propose to learn multi-modal representations from audio-visual data using cross-modal attention. In our generalised audio-visual zero-shot learning setting, we include all the training classes in the test-time search space. Due to the lack of a unified benchmark in this domain, we introduce a (generalised) zero-shot learning benchmark on three audio-visual datasets.
arXiv Detail & Related papers (2022-03-07T18:52:13Z)
Incremental Learning from Low-labelled Stream Data in Open-Set Video Face Recognition [0.0]
We propose a novel incremental learning approach which combines a deep features encoder with an Open-Set Dynamic Ensembles of SVM. Our method can use unsupervised operational data to enhance recognition. Results show a benefit of up to 15% F1-score increase respect to non-adaptive state-of-the-art methods.
arXiv Detail & Related papers (2020-12-17T13:28:13Z)
Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach. We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks. We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.