Perceptual Musical Features for Interpretable Audio Tagging
- URL: http://arxiv.org/abs/2312.11234v3
- Date: Fri, 23 Feb 2024 13:41:18 GMT
- Title: Perceptual Musical Features for Interpretable Audio Tagging
- Authors: Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos and Giorgos
Stamou
- Abstract summary: This study explores the relevance of interpretability in the context of automatic music tagging.
We constructed a workflow that incorporates three different information extraction techniques.
We conducted experiments on two datasets, namely the MTG-Jamendo dataset and the GTZAN dataset.
- Score: 2.1730712607705485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the age of music streaming platforms, the task of automatically tagging
music audio has garnered significant attention, driving researchers to devise
methods aimed at enhancing performance metrics on standard datasets. Most
recent approaches rely on deep neural networks, which, despite their impressive
performance, possess opacity, making it challenging to elucidate their output
for a given input. While the issue of interpretability has been emphasized in
other fields like medicine, it has not received attention in music-related
tasks. In this study, we explored the relevance of interpretability in the
context of automatic music tagging. We constructed a workflow that incorporates
three different information extraction techniques: a) leveraging symbolic
knowledge, b) utilizing auxiliary deep neural networks, and c) employing signal
processing to extract perceptual features from audio files. These features were
subsequently used to train an interpretable machine-learning model for tag
prediction. We conducted experiments on two datasets, namely the MTG-Jamendo
dataset and the GTZAN dataset. Our method surpassed the performance of baseline
models in both tasks and, in certain instances, demonstrated competitiveness
with the current state-of-the-art. We conclude that there are use cases where
the deterioration in performance is outweighed by the value of
interpretability.
Related papers
- Self-Supervised Contrastive Learning for Robust Audio-Sheet Music
Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content.
We employ the snippet embeddings in the higher-level task of cross-modal piece identification.
In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z) - Enhancing Affective Representations of Music-Induced EEG through
Multimodal Supervision and latent Domain Adaptation [34.726185927120355]
We employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space.
We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities.
The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries.
arXiv Detail & Related papers (2022-02-20T07:32:12Z) - Detecting Generic Music Features with Single Layer Feedforward Network
using Unsupervised Hebbian Computation [3.8707695363745223]
The authors extract information on such features from a popular open-source music corpus.
They apply unsupervised Hebbian learning techniques on their single-layer neural network using the same dataset.
The unsupervised training algorithm enhances their proposed neural network to achieve an accuracy of 90.36% for successful music feature detection.
arXiv Detail & Related papers (2020-08-31T13:57:31Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio
Representations [32.456824945999465]
We propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags.
We evaluate the quality of our embedding model, measuring its performance as a feature extractor on three different tasks.
arXiv Detail & Related papers (2020-06-15T13:17:18Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.