Related papers: Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar

Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar

URL: http://arxiv.org/abs/2307.07426v1
Date: Thu, 13 Jul 2023 10:48:29 GMT
Title: Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar
Authors: Andrea Martelloni, Andrew P McPherson, Mathieu Barthet
Abstract summary: Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and variational autoencoders (VAEs)
Score: 2.5291326778025143
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified.

Related papers

Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data [57.85958428020496]
Flow-Guided Neural Operator (FGNO) is a novel framework combining operator learning with flow matching for SSL training.<n>FGNO learns mappings in functional spaces by using Short-Time Fourier Transform to unify different time resolutions.<n>Unlike prior generative SSL methods that use noisy inputs during inference, we propose using clean inputs for representation extraction while learning representations with noise.
arXiv Detail & Related papers (2026-02-12T18:54:57Z)
FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z)
Music Genre Classification Using Machine Learning Techniques [0.0]
This paper presents a comparative analysis of machine learning methodologies for automatic music genre classification.<n>We evaluate the performance of classical classifiers, including Support Vector Machines (SVM) and ensemble methods, trained on a comprehensive set of hand-crafted audio features.<n>Our findings demonstrate a noteworthy result: the SVM, leveraging domain-specific feature engineering, achieves superior classification accuracy compared to the end-to-end CNN model.
arXiv Detail & Related papers (2025-09-01T20:43:55Z)
FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification [56.925103708982164]
We present a novel perspective from the frequency domain and identify three advantages for downstream classification: global, independent, and compact.<n>We propose the lightweight yet effective Frequency Refined Augmentation (FreRA) tailored for time series contrastive learning on classification tasks.<n>FreRA consistently outperforms ten leading baselines on time series classification, anomaly detection, and transfer learning tasks.
arXiv Detail & Related papers (2025-05-29T07:18:28Z)
Progressive Rock Music Classification [0.0]
This study investigates the classification of progressive rock music, a genre characterized by complex compositions and diverse instrumentation. We extracted comprehensive audio features, including spectrograms, Mel-Frequency Cepstral Coefficients (MFCCs), chromagrams, and beat positions from song snippets. A winner-take-all voting strategy was employed to aggregate snippet-level predictions into final song classifications.
arXiv Detail & Related papers (2025-04-15T02:48:52Z)
FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures [3.463789345862036]
We introduce a new method based on Joint-Embedding Predictive Architectures, where an encoder and a predictor are jointly trained to produce latent representations of a context. In particular, we design our predictor to be conditioned on arbitrary instruments, enabling our model to perform zero-shot stem retrieval. We validate the retrieval performances of our model using the MUSDB18 and MoisesDB datasets.
arXiv Detail & Related papers (2024-11-29T16:11:47Z)
Self-supervised Learning for Acoustic Few-Shot Classification [10.180992026994739]
We introduce and evaluate a new architecture that combines CNN-based preprocessing with feature extraction based on state space models (SSMs) We pre-train this architecture using contrastive learning on the actual task data and subsequent fine-tuning with an extremely small amount of labelled data. Our evaluation shows that it outperforms state-of-the-art architectures on the few-shot classification problem.
arXiv Detail & Related papers (2024-09-15T07:45:11Z)
On the Condition Monitoring of Bolted Joints through Acoustic Emission and Deep Transfer Learning: Generalization, Ordinal Loss and Super-Convergence [0.12289361708127876]
This paper investigates the use of deep transfer learning based on convolutional neural networks (CNNs) to monitor bolted joints using acoustic emissions. We evaluate the performance of our methodology using the ORION-AE benchmark, a structure composed of two thin beams connected by three bolts.
arXiv Detail & Related papers (2024-05-29T13:07:21Z)
Understanding learning from EEG data: Combining machine learning and feature engineering based on hidden Markov models and mixed models [0.0]
Frontal theta oscillations are thought to play an important role in spatial navigation and memory. EEG datasets are very complex, making changes in the neural signal related to behaviour difficult to interpret. This paper proposes using hidden Markov and linear mixed effects models to extract features from EEG data.
arXiv Detail & Related papers (2023-11-14T12:24:12Z)
Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers [7.817685358710508]
Zero-Shot (ZS) learning overcomes restriction by predicting classes based on adaptable class descriptions. This study sets out to investigate the effectiveness of self-attention-based audio embedding architectures for ZS learning.
arXiv Detail & Related papers (2022-08-24T09:48:22Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Model-based Deep Learning Receiver Design for Rate-Splitting Multiple Access [65.21117658030235]
This work proposes a novel design for a practical RSMA receiver based on model-based deep learning (MBDL) methods. The MBDL receiver is evaluated in terms of uncoded Symbol Error Rate (SER), throughput performance through Link-Level Simulations (LLS) and average training overhead. Results reveal that the MBDL outperforms by a significant margin the SIC receiver with imperfect CSIR.
arXiv Detail & Related papers (2022-05-02T12:23:55Z)
Is Disentanglement enough? On Latent Representations for Controllable Music Generation [78.8942067357231]
In the absence of a strong generative decoder, disentanglement does not necessarily imply controllability. The structure of the latent space with respect to the VAE-decoder plays an important role in boosting the ability of a generative model to manipulate different attributes.
arXiv Detail & Related papers (2021-08-01T18:37:43Z)
DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score. The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z)
Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms [30.3491261167433]
Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms. Deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes. We attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models.
arXiv Detail & Related papers (2021-02-13T13:44:46Z)
Fast accuracy estimation of deep learning based multi-class musical source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network. Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z)
Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated. We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.