Resource-constrained stereo singing voice cancellation
- URL: http://arxiv.org/abs/2401.12068v1
- Date: Mon, 22 Jan 2024 16:05:30 GMT
- Title: Resource-constrained stereo singing voice cancellation
- Authors: Clara Borrelli, James Rae, Dogac Basaran, Matt McVicar, Mehrez Souden,
Matthias Mauch
- Abstract summary: We study the problem of stereo singing voice cancellation.
Our approach is evaluated using objective offline metrics and a large-scale MUSHRA trial.
- Score: 1.0962868591006976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of stereo singing voice cancellation, a subtask of music
source separation, whose goal is to estimate an instrumental background from a
stereo mix. We explore how to achieve performance similar to large
state-of-the-art source separation networks starting from a small, efficient
model for real-time speech separation. Such a model is useful when memory and
compute are limited and singing voice processing has to run with limited
look-ahead. In practice, this is realised by adapting an existing mono model to
handle stereo input. Improvements in quality are obtained by tuning model
parameters and expanding the training set. Moreover, we highlight the benefits
a stereo model brings by introducing a new metric which detects attenuation
inconsistencies between channels. Our approach is evaluated using objective
offline metrics and a large-scale MUSHRA trial, confirming the effectiveness of
our techniques in stringent listening tests.
Related papers
- MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - Self-supervised Auxiliary Loss for Metric Learning in Music
Similarity-based Retrieval and Auto-tagging [0.0]
We propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge.
We also found that refraining from employing augmentation during the fine-tuning phase yields better results.
arXiv Detail & Related papers (2023-04-15T02:00:28Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Improved singing voice separation with chromagram-based pitch-aware
remixing [26.299721372221736]
We propose chromagram-based pitch-aware remixing, where music segments with high pitch alignment are mixed.
We demonstrate that training models with pitch-aware remixing significantly improves the test signal-to-distortion ratio (SDR)
arXiv Detail & Related papers (2022-03-28T20:55:54Z) - DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score.
The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio
Representations [32.456824945999465]
We propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags.
We evaluate the quality of our embedding model, measuring its performance as a feature extractor on three different tasks.
arXiv Detail & Related papers (2020-06-15T13:17:18Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.