End-to-end Music Remastering System Using Self-supervised and
Adversarial Training
- URL: http://arxiv.org/abs/2202.08520v1
- Date: Thu, 17 Feb 2022 08:50:12 GMT
- Title: End-to-end Music Remastering System Using Self-supervised and
Adversarial Training
- Authors: Junghyun Koo, Seungryeol Paik, Kyogu Lee
- Abstract summary: We propose an end-to-end music remastering system that transforms the mastering style of input audio to that of the target.
The system is trained in a self-supervised manner, in which released pop songs were used for training.
We validate our results with quantitative metrics and a subjective listening test and show that the model generated samples of mastering style similar to the target.
- Score: 18.346033788545135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mastering is an essential step in music production, but it is also a
challenging task that has to go through the hands of experienced audio
engineers, where they adjust tone, space, and volume of a song. Remastering
follows the same technical process, in which the context lies in mastering a
song for the times. As these tasks have high entry barriers, we aim to lower
the barriers by proposing an end-to-end music remastering system that
transforms the mastering style of input audio to that of the target. The system
is trained in a self-supervised manner, in which released pop songs were used
for training. We also anticipated the model to generate realistic audio
reflecting the reference's mastering style by applying a pre-trained encoder
and a projection discriminator. We validate our results with quantitative
metrics and a subjective listening test and show that the model generated
samples of mastering style similar to the target.
Related papers
- MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - Resource-constrained stereo singing voice cancellation [1.0962868591006976]
We study the problem of stereo singing voice cancellation.
Our approach is evaluated using objective offline metrics and a large-scale MUSHRA trial.
arXiv Detail & Related papers (2024-01-22T16:05:30Z) - Controllable Music Production with Diffusion Models and Guidance
Gradients [3.187381965457262]
We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in 44.1kHz stereo audio.
The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic characteristics to existing audio clips.
arXiv Detail & Related papers (2023-11-01T16:01:01Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - Music Instrument Classification Reprogrammed [79.68916470119743]
"Reprogramming" is a technique that utilizes pre-trained deep and complex neural networks originally targeting a different task by modifying and mapping both the input and output of the pre-trained model.
We demonstrate that reprogramming can effectively leverage the power of the representation learned for a different task and that the resulting reprogrammed system can perform on par or even outperform state-of-the-art systems at a fraction of training parameters.
arXiv Detail & Related papers (2022-11-15T18:26:01Z) - Automatic music mixing with deep learning and out-of-domain data [10.670987762781834]
Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge.
We propose a novel data preprocessing method that allows the models to perform automatic music mixing.
We also redesigned a listening test method for evaluating music mixing systems.
arXiv Detail & Related papers (2022-08-24T10:50:22Z) - Learning music audio representations via weak language supervision [14.335950077921435]
We design a multimodal architecture for music and language pre-training (MuLaP) optimised via a set of proxy tasks.
weak supervision is provided in the form of noisy natural language descriptions conveying the overall musical content of the track.
We demonstrate the usefulness of our approach by comparing the performance of audio representations produced by the same audio backbone with different training strategies.
arXiv Detail & Related papers (2021-12-08T10:30:52Z) - Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural
Sounds [118.54908665440826]
Humans can robustly recognize and localize objects by using visual and/or auditory cues.
This work develops an approach for scene understanding purely based on sounds.
The co-existence of visual and audio cues is leveraged for supervision transfer.
arXiv Detail & Related papers (2021-09-06T22:24:00Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.