STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
- URL: http://arxiv.org/abs/2506.06276v1
- Date: Fri, 06 Jun 2025 17:58:39 GMT
- Title: STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
- Authors: Jiatao Gu, Tianrong Chen, David Berthelot, Huangjie Zheng, Yuyang Wang, Ruixiang Zhang, Laurent Dinh, Miguel Angel Bautista, Josh Susskind, Shuangfei Zhai,
- Abstract summary: We present a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis.<n>The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers.
- Score: 44.2114053357308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers. We first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce several key architectural and algorithmic innovations to significantly enhance scalability: (1) a deep-shallow design, wherein a deep Transformer block captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial; (2) modeling in the latent space of pretrained autoencoders, which proves more effective than direct pixel-level modeling; and (3) a novel guidance algorithm that significantly boosts sample quality. Crucially, our model remains an end-to-end normalizing flow, enabling exact maximum likelihood training in continuous spaces without discretization. STARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality. To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution.
Related papers
- HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance [70.69373563281324]
HiFlow is a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models.<n>HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models.
arXiv Detail & Related papers (2025-04-08T17:30:40Z) - NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow Transformers [10.84639914909133]
Flow-based Transformer models have achieved state-of-the-art image generation performance, but often suffer from high inference latency and computational cost.<n>We propose Bridged Progressive Rectified Flow Transformers (NAMI), which decompose the generation process across temporal, spatial, and architectural demensions.
arXiv Detail & Related papers (2025-03-12T10:38:58Z) - Jet: A Modern Transformer-Based Normalizing Flow [62.2573739835562]
We revisit the design of the coupling-based normalizing flow models by carefully ablating prior design choices.<n>We achieve state-of-the-art quantitative and qualitative performance with a much simpler architecture.
arXiv Detail & Related papers (2024-12-19T18:09:42Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Free-form Flows: Make Any Architecture a Normalizing Flow [8.163244519983298]
We develop a training procedure that uses an efficient estimator for the gradient of the change of variables formula.
This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training.
We achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks.
arXiv Detail & Related papers (2023-10-25T13:23:08Z) - A Tale of Two Flows: Cooperative Learning of Langevin Flow and
Normalizing Flow Toward Energy-Based Model [43.53802699867521]
We study the cooperative learning of two generative flow models, in which the two models are iteratively updated based on jointly synthesized examples.
We show that the trained CoopFlow is capable of realistic images, reconstructing images, and interpolating between images.
arXiv Detail & Related papers (2022-05-13T23:12:38Z) - GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation.
It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z) - CAFLOW: Conditional Autoregressive Flows [1.2233362977312945]
We introduce CAFLOW, a new diverse image-to-image translation model.
We transform the conditioning image into a sequence of latent encodings using a multi-scale normalizing flow.
Our proposed framework performs well on a range of image-to-image translation tasks.
arXiv Detail & Related papers (2021-06-04T14:57:41Z) - Normalizing Flows with Multi-Scale Autoregressive Priors [131.895570212956]
We introduce channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR)
Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data.
We show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.
arXiv Detail & Related papers (2020-04-08T09:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.