Related papers: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

URL: http://arxiv.org/abs/2505.12226v1
Date: Sun, 18 May 2025 04:15:08 GMT
Title: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
Authors: Dong Yang, Yiyi Cai, Yuki Saito, Lixu Wang, Hiroshi Saruwatari,
Abstract summary: shallow flow matching (SFM) mechanism to enhance flow matching (FM)-based text-to-speech (TTS) models.<n>Experiments show that SFM consistently improves the naturalness of synthesized speech in both objective and subjective evaluations.
Score: 30.98512463695203
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a shallow flow matching (SFM) mechanism to enhance flow matching (FM)-based text-to-speech (TTS) models within a coarse-to-fine generation paradigm. SFM constructs intermediate states along the FM paths using coarse output representations. During training, we introduce an orthogonal projection method to adaptively determine the temporal position of these states, and apply a principled construction strategy based on a single-segment piecewise flow. The SFM inference starts from the intermediate state rather than pure noise and focuses computation on the latter stages of the FM paths. We integrate SFM into multiple TTS models with a lightweight SFM head. Experiments show that SFM consistently improves the naturalness of synthesized speech in both objective and subjective evaluations, while significantly reducing inference when using adaptive-step ODE solvers. Demo and codes are available at https://ydqmkkx.github.io/SFMDemo/.

Related papers

Multi-Scale Finetuning for Encoder-based Time Series Foundation Models [56.503053716053]
Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting.<n>We argue that it falls short of fully leveraging TSFMs' capabilities, often resulting in overfitting and suboptimal performance.<n>We propose textbftextscfinetextbftextsctuning (textbfMSFT), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process.
arXiv Detail & Related papers (2025-06-17T01:06:01Z)
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment [22.661660797545164]
Diffusion models have revolutionized generative tasks through high-fidelity outputs, yet flow matching (FM) offers faster inference and empirical performance gains.<n>This work addresses the critical challenge of efficiently transferring knowledge from pre-trained diffusion models to flow matching.<n>We propose Diff2Flow, a novel framework that systematically bridges diffusion and FM paradigms by rescaling timesteps, aligning interpolants, and deriving FM-compatible velocity fields from diffusion predictions.
arXiv Detail & Related papers (2025-06-02T20:05:05Z)
DFM: Interpolant-free Dual Flow Matching [0.8192907805418583]
We propose an interpolant-free dual flow matching (DFM) approach without explicit assumptions about the modeled vector field. Experiments with the SMAP unsupervised anomaly detection show advantages of DFM when compared to the CNF trained with either maximum likelihood or FM objectives.
arXiv Detail & Related papers (2024-10-11T20:46:04Z)
Local Flow Matching Generative Models [19.859984725284896]
Local Flow Matching is a computational framework for density estimation based on flow-based generative models.<n>$textttLFM$ employs a simulation-free scheme and incrementally learns a sequence of Flow Matching sub-models.<n>We demonstrate the improved training efficiency and competitive generative performance of $textttLFM$ compared to FM.
arXiv Detail & Related papers (2024-10-03T14:53:10Z)
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency [97.28511135503176]
We introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field. Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models.
arXiv Detail & Related papers (2024-07-02T16:15:37Z)
On the Evaluation of Speech Foundation Models for Spoken Language Understanding [87.52911510306011]
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for these SLU tasks. We ask: which SFMs offer the most benefits for these complex SLU tasks, and what is the most effective approach for incorporating these SFMs?
arXiv Detail & Related papers (2024-06-14T14:37:52Z)
FedPFT: Federated Proxy Fine-Tuning of Foundation Models [55.58899993272904]
Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges as a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. We propose Federated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules.
arXiv Detail & Related papers (2024-04-17T16:30:06Z)
Optimal Flow Matching: Learning Straight Trajectories in Just One Step [89.37027530300617]
We develop and theoretically justify the novel textbf Optimal Flow Matching (OFM) approach. It allows recovering the straight OT displacement for the quadratic transport in just one FM step. The main idea of our approach is the employment of vector field for FM which are parameterized by convex functions.
arXiv Detail & Related papers (2024-03-19T19:44:54Z)
Precise Knowledge Transfer via Flow Matching [24.772381404849174]
We name this framework Knowledge Transfer with Flow Matching (FM-KT) FM-KT can be integrated with a metric-based distillation method with any form (textite.g. vanilla KD, DKD, PKD and DIST) We empirically validate the scalability and state-of-the-art performance of our proposed methods among relevant comparison approaches.
arXiv Detail & Related papers (2024-02-03T03:59:51Z)
Improving and generalizing flow-based generative models with minibatch optimal transport [90.01613198337833]
We introduce the generalized conditional flow matching (CFM) technique for continuous normalizing flows (CNFs) CFM features a stable regression objective like that used to train the flow in diffusion models but enjoys the efficient inference of deterministic flow models. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference.
arXiv Detail & Related papers (2023-02-01T14:47:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.