Related papers: Blockwise Flow Matching: Improving Flow Matching Models For Efficient High-Quality Generation

Blockwise Flow Matching: Improving Flow Matching Models For Efficient High-Quality Generation

URL: http://arxiv.org/abs/2510.21167v1
Date: Fri, 24 Oct 2025 05:41:23 GMT
Title: Blockwise Flow Matching: Improving Flow Matching Models For Efficient High-Quality Generation
Authors: Dogyun Park, Taehoon Lee, Minseok Joo, Hyunwoo J. Kim,
Abstract summary: Flow Matching models have pushed the boundaries of high-fidelity data generation across a wide range of domains.<n>We propose Blockwise Flow Matching (BFM), a novel framework that partitions the generative trajectory into multiple temporal segments.<n>BFM achieves 2.1x to 4.9x accelerations in inference complexity at comparable generation performance.
Score: 33.177998521195114
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, Flow Matching models have pushed the boundaries of high-fidelity data generation across a wide range of domains. It typically employs a single large network to learn the entire generative trajectory from noise to data. Despite their effectiveness, this design struggles to capture distinct signal characteristics across timesteps simultaneously and incurs substantial inference costs due to the iterative evaluation of the entire model. To address these limitations, we propose Blockwise Flow Matching (BFM), a novel framework that partitions the generative trajectory into multiple temporal segments, each modeled by smaller but specialized velocity blocks. This blockwise design enables each block to specialize effectively in its designated interval, improving inference efficiency and sample quality. To further enhance generation fidelity, we introduce a Semantic Feature Guidance module that explicitly conditions velocity blocks on semantically rich features aligned with pretrained representations. Additionally, we propose a lightweight Feature Residual Approximation strategy that preserves semantic quality while significantly reducing inference cost. Extensive experiments on ImageNet 256x256 demonstrate that BFM establishes a substantially improved Pareto frontier over existing Flow Matching methods, achieving 2.1x to 4.9x accelerations in inference complexity at comparable generation performance. Code is available at https://github.com/mlvlab/BFM.

Related papers

Domain-Incremental Continual Learning for Robust and Efficient Keyword Spotting in Resource Constrained Systems [0.0]
Keywords Spotting systems with small footprint models deployed on edge devices face significant accuracy and robustness challenges.<n>We propose a comprehensive framework for continual learning designed to adapt to new domains while maintaining computational efficiency.<n>The proposed pipeline integrates a dual-input Convolutional Neural Network, utilizing both Mel Frequency Cepstral Coefficients (MFCC) and Mel-spectrogram features.
arXiv Detail & Related papers (2026-01-22T17:59:31Z)
Real-Time LiDAR Super-Resolution via Frequency-Aware Multi-Scale Fusion [0.4078247440919472]
FLASH (Frequency-aware LiDAR Adaptive Super-resolution with Hierarchical fusion) is a novel framework that overcomes limitations through dual-domain processing.<n> FLASH integrates two key innovations: (i) Frequency-Aware Window Attention that combines local spatial attention with global frequency-domain analysis via FFT, capturing both fine-grained geometry and periodic scanning patterns at log-linear complexity, and (ii) Adaptive Multi-Scale Fusion that replaces conventional skip connections with learned position-specific feature aggregation, enhanced by CBAM attention for dynamic feature selection.
arXiv Detail & Related papers (2025-11-10T18:38:15Z)
Fast-dLLM v2: Efficient Block-Diffusion LLM [64.38006546510337]
Fast-dLLM v2 is a block diffusion language model that adapts pretrained AR models into dLLMs for parallel text generation.<n>This represents a 500x reduction in training data compared to full-attention diffusion LLMs such as Dream (580B tokens)
arXiv Detail & Related papers (2025-09-30T14:40:18Z)
Sortblock: Similarity-Aware Feature Reuse for Diffusion Model [9.749736545966694]
Diffusion Transformers (DiTs) have demonstrated remarkable generative capabilities.<n>DiTs' sequential denoising process results in high inference latency.<n>We propose Sortblock, a training-free inference acceleration framework.
arXiv Detail & Related papers (2025-08-01T08:10:54Z)
Improving Progressive Generation with Decomposable Flow Matching [50.63174319509629]
Decomposable Flow Matching (DFM) is a simple and effective framework for the progressive generation of visual media.<n>On Imagenet-1k 512px, DFM achieves 35.2% improvements in FDD scores over the base architecture and 26.4% over the best-performing baseline.
arXiv Detail & Related papers (2025-06-24T17:58:02Z)
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers [86.5541501589166]
DiffMoE is a batch-level global token pool that enables experts to access global token distributions during training.<n>It achieves state-of-the-art performance among diffusion models on ImageNet benchmark.<n>The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation.
arXiv Detail & Related papers (2025-03-18T17:57:07Z)
Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling [49.215957313126324]
Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling. However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details. We propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results.
arXiv Detail & Related papers (2024-05-05T14:05:33Z)
StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences [31.210626775505407]
Occlusions between consecutive frames have long posed a significant challenge in optical flow estimation. We present a Streamlined In-batch Multi-frame (SIM) pipeline tailored to video input, attaining a similar level of time efficiency to two-frame networks. StreamFlow not only excels in terms of performance on challenging KITTI and Sintel datasets, with particular improvement in occluded areas.
arXiv Detail & Related papers (2023-11-28T07:53:51Z)
Flow Matching in Latent Space [2.9330609943398525]
Flow matching is a framework to train generative models that exhibits impressive empirical performance. We propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency. Our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks.
arXiv Detail & Related papers (2023-07-17T17:57:56Z)
Non-Separable Multi-Dimensional Network Flows for Visual Computing [62.50191141358778]
We propose a novel formalism for non-separable multi-dimensional network flows. Since the flow is defined on a per-dimension basis, the maximizing flow automatically chooses the best matching feature dimensions. As a proof of concept, we apply our formalism to the multi-object tracking problem and demonstrate that our approach outperforms scalar formulations on the MOT16 benchmark in terms of robustness to noise.
arXiv Detail & Related papers (2023-05-15T13:21:44Z)
Gradient Sparsification for Efficient Wireless Federated Learning with Differential Privacy [25.763777765222358]
Federated learning (FL) enables distributed clients to collaboratively train a machine learning model without sharing raw data with each other. As the model size grows, the training latency due to limited transmission bandwidth and private information degrades while using differential privacy (DP) protection. We propose sparsification empowered FL framework wireless channels, in over to improve training efficiency without sacrificing convergence performance.
arXiv Detail & Related papers (2023-04-09T05:21:15Z)
Low-Latency Federated Learning over Wireless Channels with Differential Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server. In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.