Patch Diffusion: Faster and More Data-Efficient Training of Diffusion
Models
- URL: http://arxiv.org/abs/2304.12526v2
- Date: Wed, 18 Oct 2023 21:53:33 GMT
- Title: Patch Diffusion: Faster and More Data-Efficient Training of Diffusion
Models
- Authors: Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He,
Zhangyang Wang, Weizhu Chen, Mingyuan Zhou
- Abstract summary: We propose Patch Diffusion, a generic patch-wise training framework.
Patch Diffusion significantly reduces the training time costs while improving data efficiency.
We achieve outstanding FID scores in line with state-of-the-art benchmarks.
- Score: 166.64847903649598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models are powerful, but they require a lot of time and data to
train. We propose Patch Diffusion, a generic patch-wise training framework, to
significantly reduce the training time costs while improving data efficiency,
which thus helps democratize diffusion model training to broader users. At the
core of our innovations is a new conditional score function at the patch level,
where the patch location in the original image is included as additional
coordinate channels, while the patch size is randomized and diversified
throughout training to encode the cross-region dependency at multiple scales.
Sampling with our method is as easy as in the original diffusion model. Through
Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while
maintaining comparable or better generation quality. Patch Diffusion meanwhile
improves the performance of diffusion models trained on relatively small
datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve
outstanding FID scores in line with state-of-the-art benchmarks: 1.77 on
CelebA-64$\times$64, 1.93 on AFHQv2-Wild-64$\times$64, and 2.72 on
ImageNet-256$\times$256. We share our code and pre-trained models at
https://github.com/Zhendong-Wang/Patch-Diffusion.
Related papers
- Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget [53.311109531586844]
We demonstrate very low-cost training of large-scale T2I diffusion transformer models.
We train a 1.16 billion parameter sparse transformer with only $1,890 economical cost and achieve a 12.7 FID in zero-shot generation.
We aim to release our end-to-end training pipeline to further democratize the training of large-scale diffusion models on micro-budgets.
arXiv Detail & Related papers (2024-07-22T17:23:28Z) - ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models [59.90959789767886]
We show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions.
By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$times$64 and LSUN Cat 256$times$256 datasets.
arXiv Detail & Related papers (2023-11-23T16:49:06Z) - Masked Diffusion Models Are Fast Distribution Learners [32.485235866596064]
Diffusion models are commonly trained to learn all fine-grained visual information from scratch.
We show that it suffices to train a strong diffusion model by first pre-training the model to learn some primer distribution.
Then the pre-trained model can be fine-tuned for various generation tasks efficiently.
arXiv Detail & Related papers (2023-06-20T08:02:59Z) - Fast Training of Diffusion Models with Masked Transformers [107.77340216247516]
We propose an efficient approach to train large diffusion models with masked transformers.
Specifically, we randomly mask out a high proportion of patches in diffused input images during training.
Experiments on ImageNet-256x256 and ImageNet-512x512 show that our approach achieves competitive and even better generative performance than the state-of-the-art Diffusion Transformer (DiT) model.
arXiv Detail & Related papers (2023-06-15T17:38:48Z) - DiffFit: Unlocking Transferability of Large Diffusion Models via Simple
Parameter-Efficient Fine-Tuning [51.151805100550625]
This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models.
Compared with full fine-tuning, DiffFit achieves 2$times$ training speed-up and only needs to store approximately 0.12% of the total model parameters.
Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost.
arXiv Detail & Related papers (2023-04-13T16:17:50Z) - Better Diffusion Models Further Improve Adversarial Training [97.44991845907708]
It has been recognized that the data generated by the diffusion probabilistic model (DDPM) improves adversarial training.
This paper gives an affirmative answer by employing the most recent diffusion model which has higher efficiency.
Our adversarially trained models achieve state-of-the-art performance on RobustBench using only generated data.
arXiv Detail & Related papers (2023-02-09T13:46:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.