Related papers: A Large-Scale Study on Video Action Dataset Condensation

A Large-Scale Study on Video Action Dataset Condensation

URL: http://arxiv.org/abs/2412.21197v2
Date: Wed, 12 Mar 2025 03:28:28 GMT
Title: A Large-Scale Study on Video Action Dataset Condensation
Authors: Yang Chen, Sheng Guo, Bo Zheng, Limin Wang,
Abstract summary: We aim to bridge the gap between image and video dataset condensation by providing a large-scale study with systematic design and fair comparison.<n>Our work delves into three key aspects to provide valuable empirical insights: (1) temporal processing of video data, (2) the evaluation protocol for video dataset condensation, and (3) adaptation of condensation algorithms to the space-time domain.
Score: 35.194593167922804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, dataset condensation has made significant progress in the image domain. Unlike images, videos possess an additional temporal dimension, which harbors considerable redundant information, making condensation even more crucial. However, video dataset condensation still remains an underexplored area. We aim to bridge this gap by providing a large-scale study with systematic design and fair comparison. Specifically, our work delves into three key aspects to provide valuable empirical insights: (1) temporal processing of video data, (2) the evaluation protocol for video dataset condensation, and (3) adaptation of condensation algorithms to the space-time domain. From this study, we derive several intriguing observations: (i) labeling methods greatly influence condensation performance, (ii) simple sliding-window sampling is effective for temporal processing, and (iii) dataset distillation methods perform better in challenging scenarios, while sample selection methods excel in easier ones. Furthermore, we propose a unified evaluation protocol for the fair comparison of different condensation algorithms and achieve state-of-the-art results on four widely-used action recognition datasets: HMDB51, UCF101, SSv2 and K400. Our code is available at https://github.com/MCG-NJU/Video-DC.

Related papers

Dataset Condensation with Color Compensation [1.8962690634270805]
Existing methods struggle with two: image-level selection methods (Coreset Selection, dataset Quantization) suffer from condensation inefficiency.<n>We find that a critical problem in dataset condensation is the oversight of color's dual role as an information carrier and a basic semantic representation unit.<n>We propose DC3: a dataset condensation framework with Color Compensation.
arXiv Detail & Related papers (2025-08-02T01:44:23Z)
PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion [22.804486552524885]
This paper introduces PRISM, Progressive Refinement and Insertion for Sparse Motion, for video dataset condensation.<n>Unlike the previous method that separates static content from dynamic motion, our method preserves the essential interdependence between these elements.<n>Our approach progressively refines and inserts frames to fully accommodate the motion in an action while achieving better performance but less storage.
arXiv Detail & Related papers (2025-05-28T16:42:10Z)
Latent Video Dataset Distillation [6.028880672839687]
We introduce a novel video dataset distillation approach that operates in the latent space. We employ a diversity-aware data selection strategy to select both representative and diverse samples. We also introduce a simple, training-free method to further compress the latent dataset.
arXiv Detail & Related papers (2025-04-23T22:50:39Z)
Condensing Action Segmentation Datasets via Generative Network Inversion [37.78120420622088]
This work presents the first condensation approach for procedural video datasets used in temporal action segmentation. We propose a condensation framework that leverages generative prior learned from the dataset and network inversion to condense data into compact latent codes. Our evaluation on standard benchmarks demonstrates consistent effectiveness in condensing TAS datasets and achieving competitive performances.
arXiv Detail & Related papers (2025-03-18T10:29:47Z)
Video Set Distillation: Information Diversification and Temporal Densification [68.85010825225528]
Video textbfsets have two dimensions of redundancies: within-sample and inter-sample redundancies.<n>We are the first to study Video Set Distillation, which synthesizes optimized video data by addressing within-sample and inter-sample redundancies.
arXiv Detail & Related papers (2024-11-28T05:37:54Z)
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR) Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query. Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z)
Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames. Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs. This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z)
On the Importance of Spatial Relations for Few-shot Action Recognition [109.2312001355221]
In this paper, we investigate the importance of spatial relations and propose a more accurate few-shot action recognition method. A novel Spatial Alignment Cross Transformer (SA-CT) learns to re-adjust the spatial relations and incorporates the temporal information. Experiments reveal that, even without using any temporal information, the performance of SA-CT is comparable to temporal based methods on 3/4 benchmarks.
arXiv Detail & Related papers (2023-08-14T12:58:02Z)
When Super-Resolution Meets Camouflaged Object Detection: A Comparison Study [135.19004496785408]
Super Resolution (SR) and Camouflaged Object Detection (COD) are two hot topics in computer vision with various joint applications. We benchmark different super-resolution methods on commonly used COD datasets. We evaluate the robustness of different COD models by using COD data processed by SR methods.
arXiv Detail & Related papers (2023-08-08T16:17:46Z)
TempNet: Temporal Attention Towards the Detection of Animal Behaviour in Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos. TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder. We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z)
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects. We tackle this problem from two different angles: algorithm and dataset. We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z)
Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments [17.673345523918947]
We present a novel method for few-shot video classification, which performs appearance and temporal alignments. Our approach achieves similar or better results than previous methods on both datasets.
arXiv Detail & Related papers (2022-07-21T23:28:52Z)
DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods. The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z)
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection [37.25262046781015]
Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. We propose a novel ConvTransformer network for action detection that efficiently captures both short-term and long-term temporal information. Our network outperforms the state-of-the-art methods on all three datasets.
arXiv Detail & Related papers (2021-12-07T18:57:37Z)
CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis. Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge. We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z)
Selective Feature Compression for Efficient Activity Recognition Inference [26.43512549990624]
Selective Feature Compression (SFC) is an action recognition inference strategy that greatly increase model inference efficiency without any accuracy compromise. Our experiments on Kinetics-400, UCF101 and ActivityNet show that SFC is able to reduce inference speed by 6-7x memory and dimension usage by 5-6x compared with the commonly used 30 crops dense procedure sampling.
arXiv Detail & Related papers (2021-04-01T00:54:51Z)
NUTA: Non-uniform Temporal Aggregation for Action Recognition [29.75987323741384]
We propose a method called the non-uniform temporal aggregation (NUTA), which aggregates features only from informative temporal segments. Our model has achieved state-of-the-art performance on four widely used large-scale action-recognition datasets.
arXiv Detail & Related papers (2020-12-15T02:03:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.