Automatic music mixing with deep learning and out-of-domain data
- URL: http://arxiv.org/abs/2208.11428v1
- Date: Wed, 24 Aug 2022 10:50:22 GMT
- Title: Automatic music mixing with deep learning and out-of-domain data
- Authors: Marco A. Mart\'inez-Ram\'irez, Wei-Hsiang Liao, Giorgio Fabbro, Stefan
Uhlich, Chihiro Nagashima, Yuki Mitsufuji
- Abstract summary: Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge.
We propose a novel data preprocessing method that allows the models to perform automatic music mixing.
We also redesigned a listening test method for evaluating music mixing systems.
- Score: 10.670987762781834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Music mixing traditionally involves recording instruments in the form of
clean, individual tracks and blending them into a final mixture using audio
effects and expert knowledge (e.g., a mixing engineer). The automation of music
production tasks has become an emerging field in recent years, where rule-based
methods and machine learning approaches have been explored. Nevertheless, the
lack of dry or clean instrument recordings limits the performance of such
models, which is still far from professional human-made mixes. We explore
whether we can use out-of-domain data such as wet or processed multitrack music
recordings and repurpose it to train supervised deep learning models that can
bridge the current gap in automatic mixing quality. To achieve this we propose
a novel data preprocessing method that allows the models to perform automatic
music mixing. We also redesigned a listening test method for evaluating music
mixing systems. We validate our results through such subjective tests using
highly experienced mixing engineers as participants.
Related papers
- Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Music Instrument Classification Reprogrammed [79.68916470119743]
"Reprogramming" is a technique that utilizes pre-trained deep and complex neural networks originally targeting a different task by modifying and mapping both the input and output of the pre-trained model.
We demonstrate that reprogramming can effectively leverage the power of the representation learned for a different task and that the resulting reprogrammed system can perform on par or even outperform state-of-the-art systems at a fraction of training parameters.
arXiv Detail & Related papers (2022-11-15T18:26:01Z) - Music Mixing Style Transfer: A Contrastive Learning Approach to
Disentangle Audio Effects [23.29395422386749]
We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song.
This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording.
arXiv Detail & Related papers (2022-11-04T03:45:17Z) - Improved singing voice separation with chromagram-based pitch-aware
remixing [26.299721372221736]
We propose chromagram-based pitch-aware remixing, where music segments with high pitch alignment are mixed.
We demonstrate that training models with pitch-aware remixing significantly improves the test signal-to-distortion ratio (SDR)
arXiv Detail & Related papers (2022-03-28T20:55:54Z) - End-to-end Music Remastering System Using Self-supervised and
Adversarial Training [18.346033788545135]
We propose an end-to-end music remastering system that transforms the mastering style of input audio to that of the target.
The system is trained in a self-supervised manner, in which released pop songs were used for training.
We validate our results with quantitative metrics and a subjective listening test and show that the model generated samples of mastering style similar to the target.
arXiv Detail & Related papers (2022-02-17T08:50:12Z) - Modeling the Compatibility of Stem Tracks to Generate Music Mashups [6.922825755771942]
A music mashup combines audio elements from two or more songs to create a new work.
Research has developed algorithms that predict the compatibility of audio elements.
arXiv Detail & Related papers (2021-03-26T01:51:11Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - Learning to Denoise Historical Music [30.165194151843835]
We propose an audio-to-audio neural network model that learns to denoise old music recordings.
The network is trained with both reconstruction and adversarial objectives on a noisy music dataset.
Our results show that the proposed method is effective in removing noise, while preserving the quality and details of the original music.
arXiv Detail & Related papers (2020-08-05T10:05:44Z) - RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement
Learning [69.20460466735852]
This paper presents a deep reinforcement learning algorithm for online accompaniment generation.
The proposed algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part.
arXiv Detail & Related papers (2020-02-08T03:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.