Fugu-MT 論文翻訳(概要): UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching

論文の概要: UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching

arxiv url: http://arxiv.org/abs/2510.00771v1
Date: Wed, 01 Oct 2025 11:04:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 14:32:17.200715
Title: UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
Title（参考訳）: UniverSR: Vocoder-Free Flow Matchingによる、統一的でヴァーサタイルなオーディオ超解像
Authors: Woongjib Choi, Sangmin Lee, Hyungseob Lim, Hong-Goo Kang,
Abstract要約: 本稿では,複雑なスペクトル係数の条件分布を捉えるために,フローマッチング生成モデルを用いた超解像の超解像化フレームワークを提案する。実験により,我々のモデルは様々なアップサンプリング要因にまたがる高忠実度48kHzのオーディオを連続的に生成することがわかった。
参考スコア（独自算出の注目度）: 20.92242470770289
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present a vocoder-free framework for audio super-resolution that employs a flow matching generative model to capture the conditional distribution of complex-valued spectral coefficients. Unlike conventional two-stage diffusion-based approaches that predict a mel-spectrogram and then rely on a pre-trained neural vocoder to synthesize waveforms, our method directly reconstructs waveforms via the inverse Short-Time Fourier Transform (iSTFT), thereby eliminating the dependence on a separate vocoder. This design not only simplifies end-to-end optimization but also overcomes a critical bottleneck of two-stage pipelines, where the final audio quality is fundamentally constrained by vocoder performance. Experiments show that our model consistently produces high-fidelity 48 kHz audio across diverse upsampling factors, achieving state-of-the-art performance on both speech and general audio datasets.
Abstract（参考訳）: 本稿では,複雑なスペクトル係数の条件分布を抽出するフローマッチング生成モデルを用いて,オーディオ超解像のためのボコーダフリーフレームワークを提案する。メルスペクトルを予測する従来の2段階拡散法とは違って,本手法では逆短時間フーリエ変換(iSTFT)を用いて波形を直接再構成することにより,別個のボコーダへの依存を解消する。この設計は、エンドツーエンドの最適化を単純化するだけでなく、2段階パイプラインの致命的なボトルネックを克服する。実験により,本モデルでは,音声および一般の音声データセットに対して,様々なアップサンプリング要因にまたがる高忠実度48kHzの音声を連続的に生成し,最先端の性能を実現することができた。

論文の概要: UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching

関連論文リスト