Simple ReFlow: Improved Techniques for Fast Flow Models
- URL: http://arxiv.org/abs/2410.07815v1
- Date: Thu, 10 Oct 2024 11:00:55 GMT
- Title: Simple ReFlow: Improved Techniques for Fast Flow Models
- Authors: Beomsu Kim, Yu-Guan Hsieh, Michal Klein, Marco Cuturi, Jong Chul Ye, Bahjat Kawar, James Thornton,
- Abstract summary: Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps.
We propose seven improvements for training dynamics, learning and inference.
We achieve state-of-the-art FID scores (without / with guidance, resp.) for fast generation via neural ODEs.
- Score: 68.32300636049008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps, this slows inference and limits applicability to time-critical tasks. The ReFlow procedure can accelerate sampling by straightening generation trajectories. However, ReFlow is an iterative procedure, typically requiring training on simulated data, and results in reduced sample quality. To mitigate sample deterioration, we examine the design space of ReFlow and highlight potential pitfalls in prior heuristic practices. We then propose seven improvements for training dynamics, learning and inference, which are verified with thorough ablation studies on CIFAR10 $32 \times 32$, AFHQv2 $64 \times 64$, and FFHQ $64 \times 64$. Combining all our techniques, we achieve state-of-the-art FID scores (without / with guidance, resp.) for fast generation via neural ODEs: $2.23$ / $1.98$ on CIFAR10, $2.30$ / $1.91$ on AFHQv2, $2.84$ / $2.67$ on FFHQ, and $3.49$ / $1.74$ on ImageNet-64, all with merely $9$ neural function evaluations.
Related papers
- Improving the Training of Rectified Flows [14.652876697052156]
Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE.
One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error.
We propose improved techniques for training rectified flows, allowing them to compete with emphknowledge distillation methods even in the low NFE setting.
Our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation and progressive distillation in both one-step and two
arXiv Detail & Related papers (2024-05-30T17:56:04Z) - Efficient Verification-Based Face Identification [50.616875565173274]
We study the problem of performing face verification with an efficient neural model $f$.
Our model leads to a substantially small $f$ requiring only 23k parameters and 5M floating point operations (FLOPS)
We use six face verification datasets to demonstrate that our method is on par or better than state-of-the-art models.
arXiv Detail & Related papers (2023-12-20T18:08:02Z) - ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models [59.90959789767886]
We show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions.
By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$times$64 and LSUN Cat 256$times$256 datasets.
arXiv Detail & Related papers (2023-11-23T16:49:06Z) - InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation [33.70116170511312]
We propose a novel text-conditioned pipeline to turn Stable Diffusion (SD) into an ultra-fast one-step model.
We create the first one-step diffusion-based text-to-image generator with SD-level image quality, achieving an FID of $23.3$ on MS COCO 2017-5k.
arXiv Detail & Related papers (2023-09-12T16:42:09Z) - Towards Understanding and Improving GFlowNet Training [71.85707593318297]
We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution.
We propose prioritized replay training of high-reward $x$, relative edge flow policy parametrization, and a novel guided trajectory balance objective.
arXiv Detail & Related papers (2023-05-11T22:50:41Z) - Better Training of GFlowNets with Local Credit and Incomplete
Trajectories [81.14310509871935]
We consider the case where the energy function can be applied not just to terminal states but also to intermediate states.
This is for example achieved when the energy function is additive, with terms available along the trajectory.
This enables a training objective that can be applied to update parameters even with incomplete trajectories.
arXiv Detail & Related papers (2023-02-03T12:19:42Z) - FInC Flow: Fast and Invertible $k \times k$ Convolutions for Normalizing
Flows [2.156373334386171]
Invertible convolutions have been an essential element for building expressive normalizing flow-based generative models.
We propose a $k times k$ convolutional layer and Deep Normalizing Flow architecture.
arXiv Detail & Related papers (2023-01-23T04:31:03Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Emformer: Efficient Memory Transformer Based Acoustic Model For Low
Latency Streaming Speech Recognition [23.496223778642758]
Long-range history context is distilled into an augmented memory bank to reduce self-attention's computation complexity.
A cache mechanism saves the computation for the key and value in self-attention for the left context.
Under average latency of 960 ms, Emformer gets WER $2.50%$ on test-clean and $5.62%$ on test-other.
arXiv Detail & Related papers (2020-10-21T04:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.