SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering
- URL: http://arxiv.org/abs/2508.03448v1
- Date: Tue, 05 Aug 2025 13:49:04 GMT
- Title: SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering
- Authors: Jan Melechovsky, Ambuj Mehrish, Dorien Herremans,
- Abstract summary: Music recordings often suffer from audio quality issues such as excessive reverberation, distortion, clipping, tonal imbalances, and a narrowed stereo image.<n>We introduce SonicMaster, the first unified generative model for music restoration and mastering that addresses a broad spectrum of audio artifacts with text-based control.
- Score: 7.309541793344493
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Music recordings often suffer from audio quality issues such as excessive reverberation, distortion, clipping, tonal imbalances, and a narrowed stereo image, especially when created in non-professional settings without specialized equipment or expertise. These problems are typically corrected using separate specialized tools and manual adjustments. In this paper, we introduce SonicMaster, the first unified generative model for music restoration and mastering that addresses a broad spectrum of audio artifacts with text-based control. SonicMaster is conditioned on natural language instructions to apply targeted enhancements, or can operate in an automatic mode for general restoration. To train this model, we construct the SonicMaster dataset, a large dataset of paired degraded and high-quality tracks by simulating common degradation types with nineteen degradation functions belonging to five enhancements groups: equalization, dynamics, reverb, amplitude, and stereo. Our approach leverages a flow-matching generative training paradigm to learn an audio transformation that maps degraded inputs to their cleaned, mastered versions guided by text prompts. Objective audio quality metrics demonstrate that SonicMaster significantly improves sound quality across all artifact categories. Furthermore, subjective listening tests confirm that listeners prefer SonicMaster's enhanced outputs over the original degraded audio, highlighting the effectiveness of our unified approach.
Related papers
- EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing [54.10773655199149]
We investigate leveraging cross-attention control for efficient audio editing within auto-regressive models.<n>Inspired by image editing methodologies, we develop a Prompt-to-Prompt-like approach that guides edits through cross and self-attention mechanisms.
arXiv Detail & Related papers (2025-07-15T08:44:11Z) - Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing [60.38045088180188]
We propose an acoustic-prosody disentangled two-stage method to achieve high-quality dubbing generation with precise prosody alignment.<n>We incorporate an in-domain emotion analysis module to reduce the impact of visual domain shifts across different movies.<n>Our method performs favorably against the state-of-the-art models on two primary benchmarks.
arXiv Detail & Related papers (2025-03-15T08:25:57Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - Resource-constrained stereo singing voice cancellation [1.0962868591006976]
We study the problem of stereo singing voice cancellation.
Our approach is evaluated using objective offline metrics and a large-scale MUSHRA trial.
arXiv Detail & Related papers (2024-01-22T16:05:30Z) - Exploiting Time-Frequency Conformers for Music Audio Enhancement [21.243039524049614]
We propose a music enhancement system based on the Conformer architecture.
Our approach explores the attention mechanisms of the Conformer and examines their performance to discover the best approach for the music enhancement task.
arXiv Detail & Related papers (2023-08-24T06:56:54Z) - AdVerb: Visually Guided Audio Dereverberation [49.958724234969445]
We present AdVerb, a novel audio-visual dereverberation framework.
It uses visual cues in addition to the reverberant sound to estimate clean audio.
arXiv Detail & Related papers (2023-08-23T18:20:59Z) - End-to-end Music Remastering System Using Self-supervised and
Adversarial Training [18.346033788545135]
We propose an end-to-end music remastering system that transforms the mastering style of input audio to that of the target.
The system is trained in a self-supervised manner, in which released pop songs were used for training.
We validate our results with quantitative metrics and a subjective listening test and show that the model generated samples of mastering style similar to the target.
arXiv Detail & Related papers (2022-02-17T08:50:12Z) - Toward Degradation-Robust Voice Conversion [94.60503904292916]
Any-to-any voice conversion technologies convert the vocal timbre of an utterance to any speaker even unseen during training.
It is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations.
We report in this paper the first comprehensive study on the degradation of robustness of any-to-any voice conversion.
arXiv Detail & Related papers (2021-10-14T17:00:34Z) - Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating
Source Separation [96.18178553315472]
We propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio.
We integrate both stereo generation and source separation into a unified framework, Sep-Stereo.
arXiv Detail & Related papers (2020-07-20T06:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.