Mixing-Specific Data Augmentation Techniques for Improved Blind
Violin/Piano Source Separation
- URL: http://arxiv.org/abs/2008.02480v1
- Date: Thu, 6 Aug 2020 07:02:24 GMT
- Title: Mixing-Specific Data Augmentation Techniques for Improved Blind
Violin/Piano Source Separation
- Authors: Ching-Yu Chiu, Wen-Yi Hsiao, Yin-Cheng Yeh, Yi-Hsuan Yang, Alvin
Wen-Yu Su
- Abstract summary: Blind music source separation has been a popular subject of research in both the music information retrieval and signal processing communities.
To counter the lack of available multi-track data for supervised model training, a data augmentation method that creates artificial mixtures has been shown useful in recent works.
We consider more sophisticated mixing settings employed in the modern music production routine, the relationship between the tracks to be combined, and factors of silence.
- Score: 29.956390660450484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Blind music source separation has been a popular and active subject of
research in both the music information retrieval and signal processing
communities. To counter the lack of available multi-track data for supervised
model training, a data augmentation method that creates artificial mixtures by
combining tracks from different songs has been shown useful in recent works.
Following this light, we examine further in this paper extended data
augmentation methods that consider more sophisticated mixing settings employed
in the modern music production routine, the relationship between the tracks to
be combined, and factors of silence. As a case study, we consider the
separation of violin and piano tracks in a violin piano ensemble, evaluating
the performance in terms of common metrics, namely SDR, SIR, and SAR. In
addition to examining the effectiveness of these new data augmentation methods,
we also study the influence of the amount of training data. Our evaluation
shows that the proposed mixing-specific data augmentation methods can help
improve the performance of a deep learning-based model for source separation,
especially in the case of small training data.
Related papers
- Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning [50.382793324572845]
Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy.
In this paper, we analyze a new method that incorporates the ideas of using data similarity and clients sampling.
To address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method.
arXiv Detail & Related papers (2024-09-22T00:49:10Z) - A Survey on Mixup Augmentations and Beyond [59.578288906956736]
Mixup and relevant data-mixing methods that convexly combine selected samples and the corresponding labels are widely adopted.
This survey presents a comprehensive review of foundational mixup methods and their applications.
arXiv Detail & Related papers (2024-09-08T19:32:22Z) - Anchor-aware Deep Metric Learning for Audio-visual Retrieval [11.675472891647255]
Metric learning aims at capturing the underlying data structure and enhancing the performance of tasks like audio-visual cross-modal retrieval (AV-CMR)
Recent works employ sampling methods to select impactful data points from the embedding space during training.
However, the model training fails to fully explore the space due to the scarcity of training data points.
We propose an innovative Anchor-aware Deep Metric Learning (AADML) method to address this challenge.
arXiv Detail & Related papers (2024-04-21T22:44:44Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Investigating Personalization Methods in Text to Music Generation [21.71190700761388]
Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods.
For evaluation, we construct a novel dataset with prompts and music clips.
Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody.
arXiv Detail & Related papers (2023-09-20T08:36:34Z) - PartMix: Regularization Strategy to Learn Part Discovery for
Visible-Infrared Person Re-identification [76.40417061480564]
We present a novel data augmentation technique, dubbed PartMix, for part-based Visible-Infrared person Re-IDentification (VI-ReID) models.
We synthesize the augmented samples by mixing the part descriptors across the modalities to improve the performance of part-based VI-ReID models.
arXiv Detail & Related papers (2023-04-04T05:21:23Z) - Improved singing voice separation with chromagram-based pitch-aware
remixing [26.299721372221736]
We propose chromagram-based pitch-aware remixing, where music segments with high pitch alignment are mixed.
We demonstrate that training models with pitch-aware remixing significantly improves the test signal-to-distortion ratio (SDR)
arXiv Detail & Related papers (2022-03-28T20:55:54Z) - Source Separation-based Data Augmentation for Improved Joint Beat and
Downbeat Tracking [33.05612957858605]
We propose to use a blind drum separation model to segregate the drum and non-drum sounds from each training audio signal.
We report experiments on four completely unseen test sets, validating the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-06-16T11:09:05Z) - Multitask learning for instrument activation aware music source
separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance.
We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset.
The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.