Purrception: Variational Flow Matching for Vector-Quantized Image Generation
- URL: http://arxiv.org/abs/2510.01478v1
- Date: Wed, 01 Oct 2025 21:41:30 GMT
- Title: Purrception: Variational Flow Matching for Vector-Quantized Image Generation
- Authors: Răzvan-Andrei Matişan, Vincent Tao Hu, Grigory Bartosh, Björn Ommer, Cees G. M. Snoek, Max Welling, Jan-Willem van de Meent, Mohammad Mahdi Derakhshani, Floor Eijkelboom,
- Abstract summary: Purrception is a variational flow matching approach for vector-quantized image generation.<n>Our method adapts Variational Flow Matching to vector-quantized latents by learning categorical posteriors over codebook indices.<n>This combines the geometric awareness of continuous methods with the discrete supervision of categorical approaches.
- Score: 79.74708247230218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Purrception, a variational flow matching approach for vector-quantized image generation that provides explicit categorical supervision while maintaining continuous transport dynamics. Our method adapts Variational Flow Matching to vector-quantized latents by learning categorical posteriors over codebook indices while computing velocity fields in the continuous embedding space. This combines the geometric awareness of continuous methods with the discrete supervision of categorical approaches, enabling uncertainty quantification over plausible codes and temperature-controlled generation. We evaluate Purrception on ImageNet-1k 256x256 generation. Training converges faster than both continuous flow matching and discrete flow matching baselines while achieving competitive FID scores with state-of-the-art models. This demonstrates that Variational Flow Matching can effectively bridge continuous transport and discrete supervision for improved training efficiency in image generation.
Related papers
- SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training [54.8494905524997]
Decentralized training introduces critical security risks when executed across untrusted, geographically distributed nodes.<n>We propose SENTINEL, a verification mechanism for pipeline parallelism (PP) training without duplication.<n>Experiments demonstrate successful training of up to 4B- parameter LLMs across untrusted distributed environments with up to 176 workers while maintaining model convergence and performance.
arXiv Detail & Related papers (2026-03-03T23:51:10Z) - Euphonium: Steering Video Flow Matching via Process Reward Gradient Guided Stochastic Dynamics [49.242224984144904]
We propose Euphonium, a novel framework that steers generation via process reward gradient guided dynamics.<n>Our key insight is to formulate the sampling process as a theoretically principled algorithm that explicitly incorporates the gradient of a Process Reward Model.<n>We derive a distillation objective that internalizes the guidance signal into the flow network, eliminating inference-time dependency on the reward model.
arXiv Detail & Related papers (2026-02-04T08:59:57Z) - Temporal Pair Consistency for Variance-Reduced Flow Matching [13.328987133593154]
Temporal Pair Consistency (TPC) is a lightweight variance-reduction principle that couples velocity predictions at paired timesteps along the same probability path.<n>Instantiated within flow matching, TPC improves sample quality and efficiency across CIFAR-10 and ImageNet at multiple resolutions.
arXiv Detail & Related papers (2026-02-04T00:05:21Z) - Flowception: Temporally Expansive Flow Matching for Video Generation [35.14803469800522]
Flowception is a non-autoregressive and variable-length video generation framework.<n>It learns a probability path that interleaves discrete frame insertions with continuous frame denoising.<n>By learning to insert and denoise frames in a sequence, Flowception seamlessly integrates different tasks such as image-to-video generation and video.
arXiv Detail & Related papers (2025-12-12T10:23:47Z) - OSCAR: Orthogonal Stochastic Control for Alignment-Respecting Diversity in Flow Matching [14.664226708184676]
Flow-based text-to-image models follow deterministic trajectories, forcing users to repeatedly sample to discover diverse modes.<n>We present a training-free, inference-time control mechanism that makes the flow itself diversity-aware.
arXiv Detail & Related papers (2025-10-10T07:07:19Z) - Transport Based Mean Flows for Generative Modeling [19.973366424307077]
Flow-matching generative models have emerged as a powerful paradigm for continuous data generation.<n>These models suffer from slow inference due to the requirement of numerous sequential sampling steps.<n>Recent work has sought to accelerate inference by reducing the number of sampling steps.
arXiv Detail & Related papers (2025-09-26T17:12:19Z) - Image Tokenizer Needs Post-Training [76.91832192778732]
We propose a novel tokenizer training scheme, focusing on improving latent space construction and decoding respectively.<n>Specifically, we propose a plug-and-play tokenizer training scheme, which significantly enhances the robustness of tokenizer.<n>We further optimize the tokenizer decoder regarding a well-trained generative model to mitigate the distribution difference between generated and reconstructed tokens.
arXiv Detail & Related papers (2025-09-15T21:38:03Z) - Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields [7.435063833417364]
Flow matching casts sample generation as learning a continuous-time velocity field that transports noise to data.<n>We propose Graph Flow Matching, a lightweight enhancement that decomposes the learned velocity into a reaction term.<n> operating in the latent space of a pretrained variational autoencoder.
arXiv Detail & Related papers (2025-05-30T10:17:50Z) - Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints.<n>During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image.<n>Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Improving Consistency Models with Generator-Augmented Flows [16.049476783301724]
Consistency models imitate the multi-step sampling of score-based diffusion in a single forward pass of a neural network.<n>They can be learned in two ways: consistency distillation and consistency training.<n>We propose a novel flow that transports noisy data towards their corresponding outputs derived from a consistency model.
arXiv Detail & Related papers (2024-06-13T20:22:38Z) - OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation [55.676358801492114]
We propose OCAI, a method that supports robust frame ambiguities by generating intermediate video frames alongside optical flows in between.
Our evaluations demonstrate superior quality and enhanced optical flow accuracy on established benchmarks such as Sintel and KITTI.
arXiv Detail & Related papers (2024-03-26T20:23:48Z) - Flow Matching in Latent Space [2.9330609943398525]
Flow matching is a framework to train generative models that exhibits impressive empirical performance.
We propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency.
Our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks.
arXiv Detail & Related papers (2023-07-17T17:57:56Z) - End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method.
It enables plug-and-play guidance by optimizing diffusion latents.
It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.