Exploring single-song autoencoding schemes for audio-based music
structure analysis
- URL: http://arxiv.org/abs/2110.14437v1
- Date: Wed, 27 Oct 2021 13:48:25 GMT
- Title: Exploring single-song autoencoding schemes for audio-based music
structure analysis
- Authors: Axel Marmoret, J\'er\'emy E. Cohen, Fr\'ed\'eric Bimbot
- Abstract summary: This work explores a "piece-specific" autoencoding scheme, in which a low-dimensional autoencoder is trained to learn a latent/compressed representation specific to a given song.
We report that the proposed unsupervised auto-encoding scheme achieves the level of performance of supervised state-of-the-art methods with 3 seconds tolerance.
- Score: 6.037383467521294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability of deep neural networks to learn complex data relations and
representations is established nowadays, but it generally relies on large sets
of training data. This work explores a "piece-specific" autoencoding scheme, in
which a low-dimensional autoencoder is trained to learn a latent/compressed
representation specific to a given song, which can then be used to infer the
song structure. Such a model does not rely on supervision nor annotations,
which are well-known to be tedious to collect and often ambiguous in Music
Structure Analysis. We report that the proposed unsupervised auto-encoding
scheme achieves the level of performance of supervised state-of-the-art methods
with 3 seconds tolerance when using a Log Mel spectrogram representation on the
RWC-Pop dataset.
Related papers
- Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning
of Music Audio [10.946347283718923]
We present PECMAE, an interpretable model for music audio classification based on prototype learning.
Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network.
We find that the prototype-based models preserve most of the performance achieved with the autoencoder embeddings.
arXiv Detail & Related papers (2024-02-14T17:13:36Z) - Self-Supervised Contrastive Learning for Robust Audio-Sheet Music
Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content.
We employ the snippet embeddings in the higher-level task of cross-modal piece identification.
In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z) - TimeMAE: Self-Supervised Representations of Time Series with Decoupled
Masked Autoencoders [55.00904795497786]
We propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks.
The TimeMAE learns enriched contextual representations of time series with a bidirectional encoding scheme.
To solve the discrepancy issue incurred by newly injected masked embeddings, we design a decoupled autoencoder architecture.
arXiv Detail & Related papers (2023-03-01T08:33:16Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Learning Hierarchical Metrical Structure Beyond Measures [3.7294116330265394]
hierarchical structure annotations are helpful for music information retrieval and computer musicology.
We propose a data-driven approach to automatically extract hierarchical metrical structures from scores.
We show by experiments that the proposed method performs better than the rule-based approach under different orchestration settings.
arXiv Detail & Related papers (2022-09-21T11:08:52Z) - Cadence Detection in Symbolic Classical Music using Graph Neural
Networks [7.817685358710508]
We present a graph representation of symbolic scores as an intermediate means to solve the cadence detection task.
We approach cadence detection as an imbalanced node classification problem using a Graph Convolutional Network.
Our experiments suggest that graph convolution can learn non-local features that assist in cadence detection, freeing us from the need of having to devise specialized features that encode non-local context.
arXiv Detail & Related papers (2022-08-31T12:39:57Z) - Barwise Compression Schemes for Audio-Based Music Structure Analysis [4.39160562548524]
Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections.
We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song.
In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods.
arXiv Detail & Related papers (2022-02-10T12:23:57Z) - PINs: Progressive Implicit Networks for Multi-Scale Neural
Representations [68.73195473089324]
We propose a progressive positional encoding, exposing a hierarchical structure to incremental sets of frequency encodings.
Our model accurately reconstructs scenes with wide frequency bands and learns a scene representation at progressive level of detail.
Experiments on several 2D and 3D datasets show improvements in reconstruction accuracy, representational capacity and training speed compared to baselines.
arXiv Detail & Related papers (2022-02-09T20:33:37Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.