Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets
- URL: http://arxiv.org/abs/2505.20694v1
- Date: Tue, 27 May 2025 04:02:57 GMT
- Title: Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets
- Authors: Xulin Gu, Xinhao Zhong, Zhixing Wei, Yimin Zhou, Shuoyang Sun, Bin Chen, Hongpeng Wang, Yuan Luo,
- Abstract summary: We propose a novel uni-level video dataset distillation framework.<n>To address temporal redundancy and enhance motion preservation, we introduce a temporal saliency-guided filtering mechanism.<n>Our method achieves state-of-the-art performance, bridging the gap between real and distilled video data.
- Score: 13.22969334943219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset distillation (DD) has emerged as a powerful paradigm for dataset compression, enabling the synthesis of compact surrogate datasets that approximate the training utility of large-scale ones. While significant progress has been achieved in distilling image datasets, extending DD to the video domain remains challenging due to the high dimensionality and temporal complexity inherent in video data. Existing video distillation (VD) methods often suffer from excessive computational costs and struggle to preserve temporal dynamics, as na\"ive extensions of image-based approaches typically lead to degraded performance. In this paper, we propose a novel uni-level video dataset distillation framework that directly optimizes synthetic videos with respect to a pre-trained model. To address temporal redundancy and enhance motion preservation, we introduce a temporal saliency-guided filtering mechanism that leverages inter-frame differences to guide the distillation process, encouraging the retention of informative temporal cues while suppressing frame-level redundancy. Extensive experiments on standard video benchmarks demonstrate that our method achieves state-of-the-art performance, bridging the gap between real and distilled video data and offering a scalable solution for video dataset compression.
Related papers
- PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion [22.804486552524885]
This paper introduces PRISM, Progressive Refinement and Insertion for Sparse Motion, for video dataset condensation.<n>Unlike the previous method that separates static content from dynamic motion, our method preserves the essential interdependence between these elements.<n>Our approach progressively refines and inserts frames to fully accommodate the motion in an action while achieving better performance but less storage.
arXiv Detail & Related papers (2025-05-28T16:42:10Z) - Dynamic-Aware Video Distillation: Optimizing Temporal Resolution Based on Video Semantics [68.85010825225528]
Video datasets present unique challenges due to the presence of temporal information and varying levels of redundancy across different classes.<n>Existing DD approaches assume a uniform level of temporal redundancy across all different video semantics, which limits their effectiveness on video datasets.<n>We propose Dynamic-Aware Video Distillation (DAViD), a Reinforcement Learning (RL) approach to predict the optimal Temporal Resolution of the synthetic videos.
arXiv Detail & Related papers (2025-05-28T11:43:58Z) - Video Dataset Condensation with Diffusion Models [7.44997213284633]
Video dataset distillation is a promising solution to generate a compact synthetic dataset that retains the essential information from a large real dataset.<n>In this paper, we focus on video dataset distillation by employing a video diffusion model to generate high-quality synthetic videos.<n>To enhance representativeness, we introduce Video Spatio-Temporal U-Net (VST-UNet), a model designed to select a diverse and informative subset of videos.<n>We validate the effectiveness of our approach through extensive experiments on four benchmark datasets, demonstrating performance improvements of up to (10.61%) over the state-of-the
arXiv Detail & Related papers (2025-05-10T15:12:19Z) - AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset [55.82208863521353]
We propose AccVideo to reduce the inference steps for accelerating video diffusion models with synthetic dataset.<n>Our model achieves 8.5x improvements in generation speed compared to the teacher model.<n>Compared to previous accelerating methods, our approach is capable of generating videos with higher quality and resolution.
arXiv Detail & Related papers (2025-03-25T08:52:07Z) - Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones.<n>Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency.<n>We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z) - Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.<n>We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.<n>Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z) - Video Set Distillation: Information Diversification and Temporal Densification [68.85010825225528]
Video textbfsets have two dimensions of redundancies: within-sample and inter-sample redundancies.<n>We are the first to study Video Set Distillation, which synthesizes optimized video data by addressing within-sample and inter-sample redundancies.
arXiv Detail & Related papers (2024-11-28T05:37:54Z) - Self-Conditioned Probabilistic Learning of Video Rescaling [70.10092286301997]
We propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously.
We decrease the entropy of the information lost in the downscaling by maximizing its conditioned probability on the strong spatial-temporal prior information.
We extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed.
arXiv Detail & Related papers (2021-07-24T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.