Automatic Equalization for Individual Instrument Tracks Using Convolutional Neural Networks
- URL: http://arxiv.org/abs/2407.16691v1
- Date: Tue, 23 Jul 2024 17:55:25 GMT
- Title: Automatic Equalization for Individual Instrument Tracks Using Convolutional Neural Networks
- Authors: Florian Mockenhaupt, Joscha Simon Rieber, Shahan Nercessian,
- Abstract summary: We propose a novel approach for the automatic equalization of individual musical instrument tracks.
Our method begins by identifying the instrument present within a source recording in order to choose its corresponding ideal spectrum as a target.
We build upon a differentiable parametric equalizer matching neural network, demonstrating improvements relative to previously established state-of-the-art.
- Score: 2.5944208050492183
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel approach for the automatic equalization of individual musical instrument tracks. Our method begins by identifying the instrument present within a source recording in order to choose its corresponding ideal spectrum as a target. Next, the spectral difference between the recording and the target is calculated, and accordingly, an equalizer matching model is used to predict settings for a parametric equalizer. To this end, we build upon a differentiable parametric equalizer matching neural network, demonstrating improvements relative to previously established state-of-the-art. Unlike past approaches, we show how our system naturally allows real-world audio data to be leveraged during the training of our matching model, effectively generating suitably produced training targets in an automated manner mirroring conditions at inference time. Consequently, we illustrate how fine-tuning our matching model on such examples considerably improves parametric equalizer matching performance in real-world scenarios, decreasing mean absolute error by 24% relative to methods relying solely on random parameter sampling techniques as a self-supervised learning strategy. We perform listening tests, and demonstrate that our proposed automatic equalization solution subjectively enhances the tonal characteristics for recordings of common instrument types.
Related papers
- Modeling Time-Variant Responses of Optical Compressors with Selective State Space Models [0.0]
This paper presents a method for modeling optical dynamic range compressors using deep neural networks with Selective State Space models.
It features a refined technique integrating Feature-wise Linear Modulation and Gated Linear Units to adjust the network dynamically.
The proposed architecture is well-suited for low-latency and real-time applications, crucial in live audio processing.
arXiv Detail & Related papers (2024-08-22T17:03:08Z) - Serenade: A Model for Human-in-the-loop Automatic Chord Estimation [1.6385815610837167]
We show that a human-in-the-loop approach improves harmonic analysis performance over a model-only approach.
We evaluate our model on a dataset of popular music and show that, with this human-in-the-loop approach, harmonic analysis performance improves over a model-only approach.
arXiv Detail & Related papers (2023-10-17T11:31:29Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Optimizing model-agnostic Random Subspace ensembles [5.680512932725364]
We present a model-agnostic ensemble approach for supervised learning.
The proposed approach alternates between learning an ensemble of models using a parametric version of the Random Subspace approach.
We show the good performance of the proposed approach, both in terms of prediction and feature ranking, on simulated and real-world datasets.
arXiv Detail & Related papers (2021-09-07T13:58:23Z) - Training a Deep Neural Network via Policy Gradients for Blind Source
Separation in Polyphonic Music Recordings [1.933681537640272]
We propose a method for the blind separation of sounds of musical instruments in audio signals.
We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics.
Our algorithm yields high-quality results with particularly low interference on a variety of different audio samples.
arXiv Detail & Related papers (2021-07-09T06:17:04Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - A Hybrid Approach to Audio-to-Score Alignment [13.269759433551478]
Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece.
Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features.
We explore the usage of neural networks as a preprocessing step for DTW-based automatic alignment methods.
arXiv Detail & Related papers (2020-07-28T16:04:19Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z) - Generating diverse and natural text-to-speech samples using a quantized
fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples.
We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.