Scream Detection in Heavy Metal Music
- URL: http://arxiv.org/abs/2205.05580v1
- Date: Wed, 11 May 2022 15:48:56 GMT
- Title: Scream Detection in Heavy Metal Music
- Authors: Vedant Kalbag, Alexander Lerch
- Abstract summary: Harsh vocal effects such as screams or growls are more common in heavy metal vocals than the traditionally sung vocal.
This paper explores the problem of detection and classification of extreme vocal techniques in heavy metal music.
- Score: 79.68916470119743
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Harsh vocal effects such as screams or growls are far more common in heavy
metal vocals than the traditionally sung vocal. This paper explores the problem
of detection and classification of extreme vocal techniques in heavy metal
music, specifically the identification of different scream techniques. We
investigate the suitability of various feature representations, including
cepstral, spectral, and temporal features as input representations for
classification. The main contributions of this work are (i) a manually
annotated dataset comprised of over 280 minutes of heavy metal songs of various
genres with a statistical analysis of occurrences of different extreme vocal
techniques in heavy metal music, and (ii) a systematic study of different input
feature representations for the classification of heavy metal vocals
Related papers
- BASS: Benchmarking Audio LMs for Musical Structure and Semantic Reasoning [74.84822135705025]
We introduce BASS, designed to evaluate music understanding and reasoning in audio language models.<n>BASS comprises 2658 questions spanning 12 tasks, unique 1993 songs and covering over 138 hours of music.<n>We evaluate 14 open-source and frontier multimodal LMs, finding that even state-of-the-art models struggle on higher-level reasoning tasks.
arXiv Detail & Related papers (2026-02-03T23:40:31Z) - A Music Information Retrieval Approach to Classify Sub-Genres in Role Playing Games [4.755549571193836]
Video game music (VGM) is often studied under the same lens as film music.<n>We extracted musical features from VGM in games from three sub-genres of Role-Playing Games (RPG)<n>This observed correlation may be used to further suggest such features are relevant to the expected storytelling elements or play mechanics associated with the sub-genre.
arXiv Detail & Related papers (2026-01-05T22:44:22Z) - Machine Learning Approaches to Vocal Register Classification in Contemporary Male Pop Music [49.1574468325115]
In pop music, where a single artist may use a variety of timbre's and textures to achieve a desired quality, it can be difficult to identify what vocal register within the vocal range a singer is using.<n>This paper presents two methods for classifying vocal registers in an audio signal of male pop music through the analysis of textural features of mel-spectrogram images.
arXiv Detail & Related papers (2025-05-16T15:41:28Z) - Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification [0.0]
We present an innovative model for classifying music genres using attention-based temporal signature modeling.
Our approach captures the most temporally significant moments within each piece, crafting a unique "signature" for genre identification.
This work bridges the gap between technical classification tasks and the nuanced, human experience of genre.
arXiv Detail & Related papers (2024-11-18T21:57:03Z) - GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks [52.30565320125514]
GTSinger is a large global, multi-technique, free-to-use, high-quality singing corpus with realistic music scores.
We collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset.
We conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion.
arXiv Detail & Related papers (2024-09-20T18:18:14Z) - EMVD dataset: a dataset of extreme vocal distortion techniques used in heavy metal [3.462957144298955]
The dataset consists of 760 audio excerpts of 1 second to 30 seconds long, totaling about 100 min of audio material.
The distortion taxonomy within this dataset encompasses four distinct distortion techniques and three vocal effects.
Performance of a state-of-the-art deep learning model is evaluated for two different classification tasks related to vocal techniques.
arXiv Detail & Related papers (2024-06-24T07:50:52Z) - A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data.
The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z) - Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of
Polyphonic Music [73.73045854068384]
We propose to transcribe the lyrics of polyphonic music using a novel genre-conditioned network.
The proposed network adopts pre-trained model parameters, and incorporates the genre adapters between layers to capture different genre peculiarities for lyrics-genre pairs.
Our experiments show that the proposed genre-conditioned network outperforms the existing lyrics transcription systems.
arXiv Detail & Related papers (2022-04-07T09:15:46Z) - Musical Prosody-Driven Emotion Classification: Interpreting Vocalists
Portrayal of Emotions Through Machine Learning [0.0]
The role of musical prosody remains under-explored despite several studies demonstrating a strong connection between prosody and emotion.
In this study, we restrict the input of traditional machine learning algorithms to the features of musical prosody.
We utilize a methodology for individual data collection from vocalists, and personal ground truth labeling by the artist themselves.
arXiv Detail & Related papers (2021-06-04T15:40:19Z) - A General Framework for Learning Prosodic-Enhanced Representation of Rap
Lyrics [21.944835086749375]
Learning and analyzing rap lyrics is a significant basis for many web applications.
We propose a hierarchical attention variational autoencoder framework (HAVAE)
A feature aggregation strategy is proposed to appropriately integrate various features and generate prosodic-enhanced representation.
arXiv Detail & Related papers (2021-03-23T15:13:21Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.