Compose Yourself: Average-Velocity Flow Matching for One-Step Speech Enhancement
- URL: http://arxiv.org/abs/2509.15952v2
- Date: Mon, 22 Sep 2025 13:01:44 GMT
- Title: Compose Yourself: Average-Velocity Flow Matching for One-Step Speech Enhancement
- Authors: Gang Yang, Yue Lei, Wenxin Tai, Jin Wu, Jia Chen, Ting Zhong, Fan Zhou,
- Abstract summary: COSE is a one-step FM framework tailored for speech enhancement.<n>We introduce a velocity composition identity to compute average velocity efficiently.<n>Experiments show that COSE delivers up to 5x faster sampling and reduces training cost by 40%.
- Score: 46.23750572308065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion and flow matching (FM) models have achieved remarkable progress in speech enhancement (SE), yet their dependence on multi-step generation is computationally expensive and vulnerable to discretization errors. Recent advances in one-step generative modeling, particularly MeanFlow, provide a promising alternative by reformulating dynamics through average velocity fields. In this work, we present COSE, a one-step FM framework tailored for SE. To address the high training overhead of Jacobian-vector product (JVP) computations in MeanFlow, we introduce a velocity composition identity to compute average velocity efficiently, eliminating expensive computation while preserving theoretical consistency and achieving competitive enhancement quality. Extensive experiments on standard benchmarks show that COSE delivers up to 5x faster sampling and reduces training cost by 40%, all without compromising speech quality. Code is available at https://github.com/ICDM-UESTC/COSE.
Related papers
- MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows [42.55959060773461]
MeanVoiceFlow is a one-step nonparallel VC model based on mean flows.<n>MeanVoiceFlow achieves performance comparable to that of previous multi-step and distillation-based models.
arXiv Detail & Related papers (2026-02-20T09:48:23Z) - SoFlow: Solution Flow Models for One-Step Generative Modeling [10.054000663262618]
Flow Models (SoFlow) is a framework for one-step generation from scratch.<n>Flow Matching loss allows our models to provide estimated velocity fields during training.<n>Our models achieve better FID-50K scores than MeanFlow models on the ImageNet 256x256 dataset.
arXiv Detail & Related papers (2025-12-17T18:10:17Z) - High-Performance Self-Supervised Learning by Joint Training of Flow Matching [1.8659515282266286]
Flow Matching-based Foundation Model (FlowFM)<n>FlowFM reduces training time by 50.4% compared to a diffusion-based approach.<n>On downstream tasks, FlowFM surpassed the state-of-the-art SSL method (SSL-Wearables) on all five datasets.
arXiv Detail & Related papers (2025-12-17T06:35:03Z) - Flow Straighter and Faster: Efficient One-Step Generative Modeling via MeanFlow on Rectified Trajectories [14.36205662558203]
Rectified MeanFlow is a framework that models the mean velocity field along the rectified trajectory using only a single reflow step.<n>Experiments on ImageNet at 64, 256, and 512 resolutions show that Re-MeanFlow consistently outperforms prior one-step flow distillation and Rectified Flow methods in both sample quality and training efficiency.
arXiv Detail & Related papers (2025-11-28T16:50:08Z) - MeanFlow Transformers with Representation Autoencoders [71.45823902973349]
MeanFlow (MF) is a diffusion-motivated generative model that enables efficient few-step generation by learning long jumps directly from noise to data.<n>We develop an efficient training and sampling scheme for MF in the latent space of a Representation Autoencoder (RAE)<n>We achieve a 1-step FID of 2.03, outperforming vanilla MF's 3.43, while reducing sampling GFLOPS by 38% and total training cost by 83% on ImageNet 256.
arXiv Detail & Related papers (2025-11-17T06:17:08Z) - MeanFlowSE: one-step generative speech enhancement via conditional mean flow [13.437825847370442]
MeanFlowSE is a conditional generative model that learns the average velocity over finite intervals along a trajectory.<n>On VoiceBank-DEMAND, the single-step model achieves strong intelligibility, fidelity, and perceptual quality with substantially lower computational cost than multistep baselines.
arXiv Detail & Related papers (2025-09-18T11:24:47Z) - SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling [23.539625950964876]
Generative models like Flow Matching have achieved state-of-the-art performance but are often hindered by a computationally expensive iterative sampling process.<n>MeanFlow, a leading method in this area, learns this field by enforcing a differential identity that connects the average and instantaneous velocities.<n>In this work, we argue that this differential formulation is a limiting special case of a more fundamental principle.<n>We introduce SplitMeanFlow, a new training framework that enforces this algebraic consistency directly as a learning objective.
arXiv Detail & Related papers (2025-07-22T16:26:58Z) - Flexiffusion: Training-Free Segment-Wise Neural Architecture Search for Efficient Diffusion Models [50.260693393896716]
Diffusion models (DMs) are powerful generative models capable of producing high-fidelity images but constrained by high computational costs.<n>We propose Flexiffusion, a training-free NAS framework that jointly optimize generation schedules and model architectures without modifying pre-trained parameters.<n>Our work pioneers a resource-efficient paradigm for searching high-speed DMs without sacrificing quality.
arXiv Detail & Related papers (2025-06-03T06:02:50Z) - FlowTS: Time Series Generation via Rectified Flow [67.41208519939626]
FlowTS is an ODE-based model that leverages rectified flow with straight-line transport in probability space.<n>For unconditional setting, FlowTS achieves state-of-the-art performance, with context FID scores of 0.019 and 0.011 on Stock and ETTh datasets.<n>For conditional setting, we have achieved superior performance in solar forecasting.
arXiv Detail & Related papers (2024-11-12T03:03:23Z) - Consistency Flow Matching: Defining Straight Flows with Velocity Consistency [97.28511135503176]
We introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field.
Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models.
arXiv Detail & Related papers (2024-07-02T16:15:37Z) - Reducing Spatial Discretization Error on Coarse CFD Simulations Using an OpenFOAM-Embedded Deep Learning Framework [0.7223509567556214]
We propose a method for reducing the spatial discretization error of fluid dynamics problems by enhancing the quality of simulations using deep learning.
We feed the model with fine-grid data after projecting it to the coarse-grid discretization.
We substitute the default differencing scheme for the convection term by a feed-forward neural network that interpolates velocities from cell centers to face values to produce velocities that approximate the down-sampled fine-grid data well.
arXiv Detail & Related papers (2024-05-13T02:59:50Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.