Related papers: EMVD dataset: a dataset of extreme vocal distortion techniques used in heavy metal

EMVD dataset: a dataset of extreme vocal distortion techniques used in heavy metal

URL: http://arxiv.org/abs/2406.17732v1
Date: Mon, 24 Jun 2024 07:50:52 GMT
Title: EMVD dataset: a dataset of extreme vocal distortion techniques used in heavy metal
Authors: Modan Tailleur, Julien Pinquier, Laurent Millot, Corsin Vogel, Mathieu Lagrange,
Abstract summary: The dataset consists of 760 audio excerpts of 1 second to 30 seconds long, totaling about 100 min of audio material. The distortion taxonomy within this dataset encompasses four distinct distortion techniques and three vocal effects. Performance of a state-of-the-art deep learning model is evaluated for two different classification tasks related to vocal techniques.
Score: 3.462957144298955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce the Extreme Metal Vocals Dataset, which comprises a collection of recordings of extreme vocal techniques performed within the realm of heavy metal music. The dataset consists of 760 audio excerpts of 1 second to 30 seconds long, totaling about 100 min of audio material, roughly composed of 60 minutes of distorted voices and 40 minutes of clear voice recordings. These vocal recordings are from 27 different singers and are provided without accompanying musical instruments or post-processing effects. The distortion taxonomy within this dataset encompasses four distinct distortion techniques and three vocal effects, all performed in different pitch ranges. Performance of a state-of-the-art deep learning model is evaluated for two different classification tasks related to vocal techniques, demonstrating the potential of this resource for the audio processing community.

Related papers

Machine Learning Approaches to Vocal Register Classification in Contemporary Male Pop Music [49.1574468325115]
In pop music, where a single artist may use a variety of timbre's and textures to achieve a desired quality, it can be difficult to identify what vocal register within the vocal range a singer is using.<n>This paper presents two methods for classifying vocal registers in an audio signal of male pop music through the analysis of textural features of mel-spectrogram images.
arXiv Detail & Related papers (2025-05-16T15:41:28Z)
Unleashing the Power of Natural Audio Featuring Multiple Sound Sources [54.38251699625379]
Universal sound separation aims to extract clean audio tracks corresponding to distinct events from mixed audio. We propose ClearSep, a framework that employs a data engine to decompose complex naturally mixed audio into multiple independent tracks. In experiments, ClearSep achieves state-of-the-art performance across multiple sound separation tasks.
arXiv Detail & Related papers (2025-04-24T17:58:21Z)
Transfer Learning in Vocal Education: Technical Evaluation of Limited Samples Describing Mezzo-soprano [13.796982484176207]
We present a novel approach to evaluating Mezzo-soprano vocal techniques using deep learning models. We employ deep learning models pre-trained on the ImageNet and Urbansound8k datasets. Our experimental results indicate that transfer learning increases the overall accuracy (OAcc) of all models by an average of 8.3%.
arXiv Detail & Related papers (2024-10-30T13:17:13Z)
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks [52.30565320125514]
GTSinger is a large global, multi-technique, free-to-use, high-quality singing corpus with realistic music scores. We collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset. We conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion.
arXiv Detail & Related papers (2024-09-20T18:18:14Z)
Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker. It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice. Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z)
Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio. Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z)
Make-A-Voice: Unified Voice Synthesis With Discrete Representation [77.3998611565557]
Make-A-Voice is a unified framework for synthesizing and manipulating voice signals from discrete representations. We show that Make-A-Voice exhibits superior audio quality and style similarity compared with competitive baseline models.
arXiv Detail & Related papers (2023-05-30T17:59:26Z)
Detection and classification of vocal productions in large scale audio recordings [0.12930503923129208]
We propose an automatic data processing pipeline to extract vocal productions from large-scale natural audio recordings. The pipeline is based on a deep neural network and adresses both issues simultaneously. We test it on two different natural audio data sets, one from a group of Guinea baboons recorded from a primate research center and one from human babies recorded at home.
arXiv Detail & Related papers (2023-02-14T14:07:09Z)
A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data. The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z)
Scream Detection in Heavy Metal Music [79.68916470119743]
Harsh vocal effects such as screams or growls are more common in heavy metal vocals than the traditionally sung vocal. This paper explores the problem of detection and classification of extreme vocal techniques in heavy metal music.
arXiv Detail & Related papers (2022-05-11T15:48:56Z)
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition [13.373579620368046]
We have created a VocalSound dataset consisting of over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs. Experiments show that the vocal sound recognition performance of a model can be significantly improved by 41.9% by adding VocalSound dataset to an existing dataset as training material.
arXiv Detail & Related papers (2022-05-06T18:08:18Z)
Audiovisual Singing Voice Separation [25.862550744570324]
Video model takes the input of mouth movement and fuses it into the feature embeddings of an audio-based separation framework. We create two audiovisual singing performance datasets for training and evaluation. The proposed method outperforms audio-based methods in terms of separation quality on most test recordings.
arXiv Detail & Related papers (2021-07-01T06:04:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.