Real-time Percussive Technique Recognition and Embedding Learning for
the Acoustic Guitar
- URL: http://arxiv.org/abs/2307.07426v1
- Date: Thu, 13 Jul 2023 10:48:29 GMT
- Title: Real-time Percussive Technique Recognition and Embedding Learning for
the Acoustic Guitar
- Authors: Andrea Martelloni, Andrew P McPherson, Mathieu Barthet
- Abstract summary: Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments.
We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion.
We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and variational autoencoders (VAEs)
- Score: 2.5291326778025143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-time music information retrieval (RT-MIR) has much potential to augment
the capabilities of traditional acoustic instruments. We develop RT-MIR
techniques aimed at augmenting percussive fingerstyle, which blends acoustic
guitar playing with guitar body percussion. We formulate several design
objectives for RT-MIR systems for augmented instrument performance: (i) causal
constraint, (ii) perceptually negligible action-to-sound latency, (iii) control
intimacy support, (iv) synthesis control support. We present and evaluate
real-time guitar body percussion recognition and embedding learning techniques
based on convolutional neural networks (CNNs) and CNNs jointly trained with
variational autoencoders (VAEs). We introduce a taxonomy of guitar body
percussion based on hand part and location. We follow a cross-dataset
evaluation approach by collecting three datasets labelled according to the
taxonomy. The embedding quality of the models is assessed using KL-Divergence
across distributions corresponding to different taxonomic classes. Results
indicate that the networks are strong classifiers especially in a simplified
2-class recognition task, and the VAEs yield improved class separation compared
to CNNs as evidenced by increased KL-Divergence across distributions. We argue
that the VAE embedding quality could support control intimacy and rich
interaction when the latent space's parameters are used to control an external
synthesis engine. Further design challenges around generalisation to different
datasets have been identified.
Related papers
- On the Condition Monitoring of Bolted Joints through Acoustic Emission and Deep Transfer Learning: Generalization, Ordinal Loss and Super-Convergence [0.12289361708127876]
This paper investigates the use of deep transfer learning based on convolutional neural networks (CNNs) to monitor bolted joints using acoustic emissions.
We evaluate the performance of our methodology using the ORION-AE benchmark, a structure composed of two thin beams connected by three bolts.
arXiv Detail & Related papers (2024-05-29T13:07:21Z) - Understanding learning from EEG data: Combining machine learning and
feature engineering based on hidden Markov models and mixed models [0.0]
Frontal theta oscillations are thought to play an important role in spatial navigation and memory.
EEG datasets are very complex, making changes in the neural signal related to behaviour difficult to interpret.
This paper proposes using hidden Markov and linear mixed effects models to extract features from EEG data.
arXiv Detail & Related papers (2023-11-14T12:24:12Z) - An investigation of the reconstruction capacity of stacked convolutional
autoencoders for log-mel-spectrograms [2.3204178451683264]
In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand.
Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument compression.
This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch.
arXiv Detail & Related papers (2023-01-18T17:19:04Z) - Improved Zero-Shot Audio Tagging & Classification with Patchout
Spectrogram Transformers [7.817685358710508]
Zero-Shot (ZS) learning overcomes restriction by predicting classes based on adaptable class descriptions.
This study sets out to investigate the effectiveness of self-attention-based audio embedding architectures for ZS learning.
arXiv Detail & Related papers (2022-08-24T09:48:22Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Model-based Deep Learning Receiver Design for Rate-Splitting Multiple
Access [65.21117658030235]
This work proposes a novel design for a practical RSMA receiver based on model-based deep learning (MBDL) methods.
The MBDL receiver is evaluated in terms of uncoded Symbol Error Rate (SER), throughput performance through Link-Level Simulations (LLS) and average training overhead.
Results reveal that the MBDL outperforms by a significant margin the SIC receiver with imperfect CSIR.
arXiv Detail & Related papers (2022-05-02T12:23:55Z) - Is Disentanglement enough? On Latent Representations for Controllable
Music Generation [78.8942067357231]
In the absence of a strong generative decoder, disentanglement does not necessarily imply controllability.
The structure of the latent space with respect to the VAE-decoder plays an important role in boosting the ability of a generative model to manipulate different attributes.
arXiv Detail & Related papers (2021-08-01T18:37:43Z) - DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score.
The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z) - Deep Convolutional and Recurrent Networks for Polyphonic Instrument
Classification from Monophonic Raw Audio Waveforms [30.3491261167433]
Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms.
Deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes.
We attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models.
arXiv Detail & Related papers (2021-02-13T13:44:46Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.