Related papers: Music Source Restoration

Music Source Restoration

URL: http://arxiv.org/abs/2505.21827v1
Date: Tue, 27 May 2025 23:27:31 GMT
Title: Music Source Restoration
Authors: Yongyi Zang, Zheqi Dai, Mark D. Plumbley, Qiuqiang Kong,
Abstract summary: We introduce Music Source Restoration (MSR), a novel task addressing the gap between idealized source separation and real-world music production.<n>MSR models mixtures as degraded sums of individually degraded sources, with the goal of recovering original, undegraded signals.<n>Due to the lack of data for MSR, we present RawStems, a dataset annotation of 578 songs with unprocessed source signals organized into 8 primary and 17 secondary instrument groups, totaling 354.13 hours.
Score: 20.814486236405823
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Music Source Restoration (MSR), a novel task addressing the gap between idealized source separation and real-world music production. Current Music Source Separation (MSS) approaches assume mixtures are simple sums of sources, ignoring signal degradations employed during music production like equalization, compression, and reverb. MSR models mixtures as degraded sums of individually degraded sources, with the goal of recovering original, undegraded signals. Due to the lack of data for MSR, we present RawStems, a dataset annotation of 578 songs with unprocessed source signals organized into 8 primary and 17 secondary instrument groups, totaling 354.13 hours. To the best of our knowledge, RawStems is the first dataset that contains unprocessed music stems with hierarchical categories. We consider spectral filtering, dynamic range compression, harmonic distortion, reverb and lossy codec as possible degradations, and establish U-Former as a baseline method, demonstrating the feasibility of MSR on our dataset. We release the RawStems dataset annotations, degradation simulation pipeline, training code and pre-trained models to be publicly available.

Related papers

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models [107.24906866038431]
We propose REWIRE, REcycling the Web with guIded REwrite, to enrich low-quality documents so that they could become useful for training.<n>We show that mixing high-quality raw texts and our rewritten texts lead to 1.0, 1.3 and 2.5 percentage points improvement respectively across 22 diverse tasks.
arXiv Detail & Related papers (2025-06-05T07:12:12Z)
Restoration Score Distillation: From Corrupted Diffusion Pretraining to One-Step High-Quality Generation [82.39763984380625]
We propose textitRestoration Score Distillation (RSD), a principled generalization of Denoising Score Distillation (DSD)<n>RSD accommodates a broader range of corruption types, such as blurred, incomplete, or low-resolution images.<n>It consistently surpasses its teacher model across diverse restoration tasks on both natural and scientific datasets.
arXiv Detail & Related papers (2025-05-19T17:21:03Z)
Score-informed Music Source Separation: Improving Synthetic-to-real Generalization in Classical Music [8.468436398420764]
Music source separation is the task of separating a mixture of instruments into constituent tracks.<n>We propose two ways of using musical scores to aid music source separation: a score-informed model and a score-only model.<n>The score-informed model improves separation results compared to a baseline approach, but struggles to generalize from synthetic to real data.
arXiv Detail & Related papers (2025-03-10T14:08:31Z)
Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries [53.30852012059025]
Music source separation is an audio-to-audio retrieval task.<n>Recent work in music source separation has begun to challenge the fixed-stem paradigm.<n>We propose the use of hyperellipsoidal regions as queries to allow for an intuitive yet easily parametrizable approach to specifying both the target (location) and its spread.
arXiv Detail & Related papers (2025-01-27T16:13:50Z)
Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models [26.373204974010086]
Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks. This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings. We propose an inference procedure enabling the coherent generation of sources and accompaniments.
arXiv Detail & Related papers (2024-03-18T12:08:01Z)
Benchmarks and leaderboards for sound demixing tasks [44.99833362998488]
We introduce two new benchmarks for the sound source separation tasks. We compare popular models for sound demixing, as well as their ensembles, on these benchmarks. We also develop a novel approach for audio separation, based on the ensembling of different models that are suited best for the particular stem.
arXiv Detail & Related papers (2023-05-12T14:00:26Z)
Blind Restoration of Real-World Audio by 1D Operational GANs [18.462912387382346]
We propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods.
arXiv Detail & Related papers (2022-12-30T10:11:57Z)
Music Separation Enhancement with Generative Modeling [11.545349346125743]
We propose a post-processing model (the Make it Sound Good) to enhance the output of music source separation systems. Crowdsourced subjective evaluations demonstrate that human listeners prefer source estimates of bass and drums that have been post-processed by MSG.
arXiv Detail & Related papers (2022-08-26T00:44:37Z)
Unsupervised Audio Source Separation Using Differentiable Parametric Source Models [8.80867379881193]
We propose an unsupervised model-based deep learning approach to musical source separation. A neural network is trained to reconstruct the observed mixture as a sum of the sources. The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods.
arXiv Detail & Related papers (2022-01-24T11:05:30Z)
Visual Scene Graphs for Audio Source Separation [65.47212419514761]
State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments. We propose Audio Visual Scene Graph Segmenter (AVSGS), a novel deep learning model that embeds the visual structure of the scene as a graph and segments this graph into subgraphs. Our pipeline is trained end-to-end via a self-supervised task consisting of separating audio sources using the visual graph from artificially mixed sounds.
arXiv Detail & Related papers (2021-09-24T13:40:51Z)
Multi-Stage Progressive Image Restoration [167.6852235432918]
We propose a novel synergistic design that can optimally balance these competing goals. Our main proposal is a multi-stage architecture, that progressively learns restoration functions for the degraded inputs. The resulting tightly interlinked multi-stage architecture, named as MPRNet, delivers strong performance gains on ten datasets.
arXiv Detail & Related papers (2021-02-04T18:57:07Z)
DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows [145.83812019515818]
We propose DeFlow, a method for learning image degradations from unpaired data. We model the degradation process in the latent space of a shared flow-decoder network. We validate our DeFlow formulation on the task of joint image restoration and super-resolution.
arXiv Detail & Related papers (2021-01-14T18:58:01Z)
Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness. The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.