The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification
- URL: http://arxiv.org/abs/2409.17069v1
- Date: Wed, 25 Sep 2024 16:29:21 GMT
- Title: The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification
- Authors: Tashi Namgyal, Alexander Hepburn, Raul Santos-Rodriguez, Valero Laparra, Jesus Malo,
- Abstract summary: We show that models trained with perceptual metrics as loss functions can capture perceptually meaningful features.
We demonstrate that using features extracted from autoencoders trained with perceptual losses can improve performance on music understanding tasks.
- Score: 42.14708549155406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The subjective quality of natural signals can be approximated with objective perceptual metrics. Designed to approximate the perceptual behaviour of human observers, perceptual metrics often reflect structures found in natural signals and neurological pathways. Models trained with perceptual metrics as loss functions can capture perceptually meaningful features from the structures held within these metrics. We demonstrate that using features extracted from autoencoders trained with perceptual losses can improve performance on music understanding tasks, i.e. genre classification, over using these metrics directly as distances when learning a classifier. This result suggests improved generalisation to novel signals when using perceptual metrics as loss functions for representation learning.
Related papers
- Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence
of Training Data [44.659718609385315]
Perceptual metrics are traditionally used to evaluate the quality of natural signals, such as images and audio.
We show that training with perceptual losses improves the reconstruction of spectrograms and re-synthesized audio at test time over models trained with a standard Euclidean loss.
arXiv Detail & Related papers (2023-12-06T12:27:25Z) - A Symbolic Representation of Human Posture for Interpretable Learning
and Reasoning [2.678461526933908]
We introduce a qualitative spatial reasoning approach that describes the human posture in terms that are more familiar to people.
This paper explores the derivation of our symbolic representation at two levels of detail and its preliminary use as features for interpretable activity recognition.
arXiv Detail & Related papers (2022-10-17T12:22:13Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Classification and Adversarial examples in an Overparameterized Linear
Model: A Signal Processing Perspective [10.515544361834241]
State-of-the-art deep learning classifiers are highly susceptible to infinitesmal adversarial perturbations.
We find that the learned model is susceptible to adversaries in an intermediate regime where classification generalizes but regression does not.
Despite the adversarial susceptibility, we find that classification with these features can be easier than the more commonly studied "independent feature" models.
arXiv Detail & Related papers (2021-09-27T17:35:42Z) - A Novel mapping for visual to auditory sensory substitution [0.0]
visual information can be converted into audio stream via sensory substitution devices.
Results in blind object recognition for real objects was achieved 88.05 on average.
arXiv Detail & Related papers (2021-06-14T14:14:50Z) - On the relation between statistical learning and perceptual distances [61.25815733012866]
We show that perceptual sensitivity is correlated with the probability of an image in its close neighborhood.
We also explore the relation between distances induced by autoencoders and the probability distribution of the data used for training them.
arXiv Detail & Related papers (2021-06-08T14:56:56Z) - Visualizing and Understanding Vision System [0.6510507449705342]
We use a vision recognition-reconstruction network (RRN) to investigate the development, recognition, learning and forgetting mechanisms.
In digit recognition study, we witness that the RRN could maintain object invariance representation under various viewing conditions.
In the learning and forgetting study, novel structure recognition is implemented by adjusting entire synapses in low magnitude while pattern specificities of original synaptic connectivity are preserved.
arXiv Detail & Related papers (2020-06-11T07:08:49Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.