Boosting the Predictive Accurary of Singer Identification Using Discrete
Wavelet Transform For Feature Extraction
- URL: http://arxiv.org/abs/2102.00550v1
- Date: Sun, 31 Jan 2021 21:58:55 GMT
- Title: Boosting the Predictive Accurary of Singer Identification Using Discrete
Wavelet Transform For Feature Extraction
- Authors: Victoire Djimna Noyum, Younous Perieukeu Mofenjou, Cyrille Feudjio,
Alkan G\"oktug and Ernest Fokou\'e
- Abstract summary: We study the performance of the Discrete Wavelet Transform (DWT) in comparison to the Mel Frequency Cepstral Coefficient (MFCC)
We conclude that the best identification system consists of the DWT (db4) feature extraction introduced in this work combined with a linear support vector machine for identification resulting in a mean accuracy of 83.96%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Facing the diversity and growth of the musical field nowadays, the search for
precise songs becomes more and more complex. The identity of the singer
facilitates this search. In this project, we focus on the problem of
identifying the singer by using different methods for feature extraction.
Particularly, we introduce the Discrete Wavelet Transform (DWT) for this
purpose. To the best of our knowledge, DWT has never been used this way before
in the context of singer identification. This process consists of three crucial
parts. First, the vocal signal is separated from the background music by using
the Robust Principal Component Analysis (RPCA). Second, features from the
obtained vocal signal are extracted. Here, the goal is to study the performance
of the Discrete Wavelet Transform (DWT) in comparison to the Mel Frequency
Cepstral Coefficient (MFCC) which is the most used technique in audio signals.
Finally, we proceed with the identification of the singer where two methods
have experimented: the Support Vector Machine (SVM), and the Gaussian Mixture
Model (GMM). We conclude that, for a dataset of 4 singers and 200 songs, the
best identification system consists of the DWT (db4) feature extraction
introduced in this work combined with a linear support vector machine for
identification resulting in a mean accuracy of 83.96%.
Related papers
- Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods.
Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z) - Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation [0.0]
This study tackles the distinct separation of vocal components from musical spectrograms.
We employ the Short Time Fourier Transform (STFT) to extract audio waves into detailed frequency-time spectrograms.
We implement a UNet neural network to segment the spectrogram image, aiming to delineate and extract singing voice components accurately.
arXiv Detail & Related papers (2024-05-30T13:47:53Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - Late multimodal fusion for image and audio music transcription [0.0]
multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities.
We study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems.
Two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.
arXiv Detail & Related papers (2022-04-06T20:00:33Z) - TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic
Music [43.17623332544677]
TONet is a plug-and-play model that improves both tone and octave perceptions.
We present an improved input representation, the Tone-CFP, that explicitly groups harmonics.
Third, we propose a tone-octave fusion mechanism to improve the final salience feature map.
arXiv Detail & Related papers (2022-02-02T10:55:48Z) - Automatic DJ Transitions with Differentiable Audio Effects and
Generative Adversarial Networks [30.480360404811197]
A central task of a Disc Jockey (DJ) is to create a mixset of mu-sic with seamless transitions between adjacent tracks.
In this paper, we explore a data-driven approach that uses a generative adversarial network to create the song transition by learning from real-world DJ mixes.
arXiv Detail & Related papers (2021-10-13T06:25:52Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - On Transfer Learning of Traditional Frequency and Time Domain Features
in Turning [1.0965065178451106]
We use traditional signal processing tools to identify chatter in accelerometer signals obtained from a turning experiment.
The tagged signals are then used to train a classifier.
Our results show that features extracted from the Fourier spectrum are the most informative when training a classifier and testing on data from the same cutting configuration.
arXiv Detail & Related papers (2020-08-28T14:47:57Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.