Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning
- URL: http://arxiv.org/abs/2511.21136v1
- Date: Wed, 26 Nov 2025 07:36:37 GMT
- Title: Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning
- Authors: Changlin Li, Jiawei Zhang, Shuhao Liu, Sihao Lin, Zeyi Shi, Zhihui Li, Xiaojun Chang,
- Abstract summary: Ent-Prog is an efficient training framework tailored for diffusion models on human video generation.<n>It reduces both training time and GPU memory consumption while maintaining model performance.
- Score: 52.25669274734097
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Human video generation has advanced rapidly with the development of diffusion models, but the high computational cost and substantial memory consumption associated with training these models on high-resolution, multi-frame data pose significant challenges. In this paper, we propose Entropy-Guided Prioritized Progressive Learning (Ent-Prog), an efficient training framework tailored for diffusion models on human video generation. First, we introduce Conditional Entropy Inflation (CEI) to assess the importance of different model components on the target conditional generation task, enabling prioritized training of the most critical components. Second, we introduce an adaptive progressive schedule that adaptively increases computational complexity during training by measuring the convergence efficiency. Ent-Prog reduces both training time and GPU memory consumption while maintaining model performance. Extensive experiments across three datasets, demonstrate the effectiveness of Ent-Prog, achieving up to 2.2$\times$ training speedup and 2.4$\times$ GPU memory reduction without compromising generative performance.
Related papers
- Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank [65.00301565190824]
mname is a plug-and-play training framework that requires no external encoders.<n>mname achieves a state-of-the-art FID of textbf2.40 within 400k steps, significantly outperforming comparable methods.
arXiv Detail & Related papers (2025-12-09T14:39:26Z) - A Time-Series Data Augmentation Model through Diffusion and Transformer Integration [0.6437284704257459]
Deep neural networks typically require large volumes of data for training.<n>We propose a simple and effective method that combines the Diffusion and Transformer models.<n>Using the performance improvement of the model after applying augmented data as a benchmark, this approach shows its capability to produce high-quality augmented data.
arXiv Detail & Related papers (2025-05-01T09:40:45Z) - A new pathway to generative artificial intelligence by minimizing the maximum entropy [0.0]
Current models are trained by minimizing the distance between the produced data and the training set.<n>We introduce a paradigm shift through a framework where we do not fit the training set but find the most informative yet least noisy representation of the data.<n>The result is a general physics-driven model, which is data-efficient and flexible, permitting to control and influence the generative process.
arXiv Detail & Related papers (2025-02-18T21:16:33Z) - VIRL: Volume-Informed Representation Learning towards Few-shot Manufacturability Estimation [0.0]
This work introduces VIRL, a Volume-Informed Representation Learning approach to pre-train a 3D geometric encoder.
The model pre-trained by VIRL shows substantial enhancements on demonstrating improved generalizability with limited data.
arXiv Detail & Related papers (2024-06-18T05:30:26Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Preparing Lessons for Progressive Training on Language Models [75.88952808979087]
The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions.
We propose Apollo, which preptextbfares lessons for extextbfpanding textbfoperations by textbflayer functitextbfonality during training of low layers.
Experiments demonstrate that Apollo achieves state-of-the-art acceleration ratios, even rivaling methods using pretrained models.
arXiv Detail & Related papers (2024-01-17T13:04:14Z) - Fast-ELECTRA for Efficient Pre-training [83.29484808667532]
ELECTRA pre-trains language models by detecting tokens in a sequence that have been replaced by an auxiliary model.
We propose Fast-ELECTRA, which leverages an existing language model as the auxiliary model.
Our approach rivals the performance of state-of-the-art ELECTRA-style pre-training methods, while significantly eliminating the computation and memory cost brought by the joint training of the auxiliary model.
arXiv Detail & Related papers (2023-10-11T09:55:46Z) - Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data [0.0]
Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges.
Current data-efficient methods largely focus on GAN architectures, leaving a gap in training other types of generative models.
"phased data augmentation" is a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution.
arXiv Detail & Related papers (2023-05-22T03:38:59Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.