Flow Matching in Latent Space
- URL: http://arxiv.org/abs/2307.08698v1
- Date: Mon, 17 Jul 2023 17:57:56 GMT
- Title: Flow Matching in Latent Space
- Authors: Quan Dao, Hao Phung, Binh Nguyen, Anh Tran
- Abstract summary: Flow matching is a framework to train generative models that exhibits impressive empirical performance.
We propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency.
Our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks.
- Score: 2.9330609943398525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Flow matching is a recent framework to train generative models that exhibits
impressive empirical performance while being relatively easier to train
compared with diffusion-based models. Despite its advantageous properties,
prior methods still face the challenges of expensive computing and a large
number of function evaluations of off-the-shelf solvers in the pixel space.
Furthermore, although latent-based generative methods have shown great success
in recent years, this particular model type remains underexplored in this area.
In this work, we propose to apply flow matching in the latent spaces of
pretrained autoencoders, which offers improved computational efficiency and
scalability for high-resolution image synthesis. This enables flow-matching
training on constrained computational resources while maintaining their quality
and flexibility. Additionally, our work stands as a pioneering contribution in
the integration of various conditions into flow matching for conditional
generation tasks, including label-conditioned image generation, image
inpainting, and semantic-to-image generation. Through extensive experiments,
our approach demonstrates its effectiveness in both quantitative and
qualitative results on various datasets, such as CelebA-HQ, FFHQ, LSUN Church &
Bedroom, and ImageNet. We also provide a theoretical control of the
Wasserstein-2 distance between the reconstructed latent flow distribution and
true data distribution, showing it is upper-bounded by the latent flow matching
objective. Our code will be available at
https://github.com/VinAIResearch/LFM.git.
Related papers
- DeFoG: Discrete Flow Matching for Graph Generation [45.037260759871124]
We propose DeFoG, a novel framework using discrete flow matching for graph generation.
DeFoG employs a flow-based approach that features an efficient linear noising process and a flexible denoising process.
We show that DeFoG achieves state-of-the-art results on synthetic and molecular datasets.
arXiv Detail & Related papers (2024-10-05T18:52:54Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - Improving GFlowNets for Text-to-Image Diffusion Alignment [48.42367859859971]
We explore techniques that do not directly maximize the reward but rather generate high-reward images with relatively high probability.
Our method could effectively align large-scale text-to-image diffusion models with given reward information.
arXiv Detail & Related papers (2024-06-02T06:36:46Z) - Bellman Optimal Stepsize Straightening of Flow-Matching Models [14.920260435839992]
This paper introduces Bellman Optimal Stepsize Straightening (BOSS) technique for distilling flow-matching generative models.
BOSS aims specifically for a few-step efficient image sampling while adhering to a computational budget constraint.
Our results reveal that BOSS achieves substantial gains in efficiency while maintaining competitive sample quality.
arXiv Detail & Related papers (2023-12-27T05:20:20Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - NeurInt : Learning to Interpolate through Neural ODEs [18.104328632453676]
We propose a novel generative model that learns a distribution of trajectories between two images.
We demonstrate our approach's effectiveness in generating images improved quality as well as its ability to learn a diverse distribution over smooth trajectories for any pair of real source and target images.
arXiv Detail & Related papers (2021-11-07T16:31:18Z) - DeFlow: Learning Complex Image Degradations from Unpaired Data with
Conditional Flows [145.83812019515818]
We propose DeFlow, a method for learning image degradations from unpaired data.
We model the degradation process in the latent space of a shared flow-decoder network.
We validate our DeFlow formulation on the task of joint image restoration and super-resolution.
arXiv Detail & Related papers (2021-01-14T18:58:01Z) - Normalizing Flows with Multi-Scale Autoregressive Priors [131.895570212956]
We introduce channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR)
Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data.
We show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.
arXiv Detail & Related papers (2020-04-08T09:07:11Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.