wav2shape: Hearing the Shape of a Drum Machine
- URL: http://arxiv.org/abs/2007.10299v1
- Date: Mon, 20 Jul 2020 17:35:24 GMT
- Title: wav2shape: Hearing the Shape of a Drum Machine
- Authors: Han Han and Vincent Lostanlen
- Abstract summary: Disentangling and recovering physical attributes from a waveform is a challenging inverse problem in audio signal processing.
We propose to address this problem via a combination of time--frequency analysis and supervised machine learning.
- Score: 4.283530753133897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Disentangling and recovering physical attributes, such as shape and material,
from a few waveform examples is a challenging inverse problem in audio signal
processing, with numerous applications in musical acoustics as well as
structural engineering. We propose to address this problem via a combination of
time--frequency analysis and supervised machine learning. We start by
synthesizing a dataset of sounds using the functional transformation method.
Then, we represent each percussive sound in terms of its time-invariant
scattering transform coefficients and formulate the parametric estimation of
the resonator as multidimensional regression with a deep convolutional neural
network. We interpolate scattering coefficients over the surface of the drum as
a surrogate for potentially missing data, and study the response of the neural
network to interpolated samples. Lastly, we resynthesize drum sounds from
scattering coefficients, therefore paving the way towards a deep generative
model of drum sounds whose latent variables are physically interpretable.
Related papers
- Additive decomposition of one-dimensional signals using Transformers [48.7025991956527]
One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields.<n>Recent research suggests that applying the latest deep learning models to this problem presents an exciting, unexplored area with promising potential.<n>We leverage the Transformer architecture to decompose signals into their constituent components.
arXiv Detail & Related papers (2025-06-06T10:09:40Z) - Conditional score-based diffusion models for solving inverse problems in mechanics [6.319616423658121]
We propose a framework to perform Bayesian inference using conditional score-based diffusion models.
Conditional score-based diffusion models are generative models that learn to approximate the score function of a conditional distribution.
We demonstrate the efficacy of the proposed approach on a suite of high-dimensional inverse problems in mechanics.
arXiv Detail & Related papers (2024-06-19T02:09:15Z) - Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion [85.54515118077825]
This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality.
To reduce computational complexity, LinDiff employs a patch-based processing approach that partitions the input signal into small patches.
Our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed.
arXiv Detail & Related papers (2023-06-09T07:02:43Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - An investigation of the reconstruction capacity of stacked convolutional
autoencoders for log-mel-spectrograms [2.3204178451683264]
In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand.
Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument compression.
This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch.
arXiv Detail & Related papers (2023-01-18T17:19:04Z) - Deep learning for full-field ultrasonic characterization [7.120879473925905]
This study takes advantage of recent advances in machine learning to establish a physics-based data analytic platform.
Two logics, namely the direct inversion and physics-informed neural networks (PINNs), are explored.
arXiv Detail & Related papers (2023-01-06T05:01:05Z) - Multimodal Exponentially Modified Gaussian Oscillators [4.233733499457509]
This study presents a three-stage Multimodal Exponentially Modified Gaussian (MEMG) model with an optional oscillating term.
With this, synthetic ultrasound signals suffering from artifacts can be fully recovered.
Real data experimentation is carried out to demonstrate the classification capability of the acquired features.
arXiv Detail & Related papers (2022-09-25T11:48:09Z) - A deep learning driven pseudospectral PCE based FFT homogenization
algorithm for complex microstructures [68.8204255655161]
It is shown that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
It is shown, that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
arXiv Detail & Related papers (2021-10-26T07:02:14Z) - WaveTransform: Crafting Adversarial Examples via Input Decomposition [69.01794414018603]
We introduce WaveTransform', that creates adversarial noise corresponding to low-frequency and high-frequency subbands, separately (or in combination)
Experiments show that the proposed attack is effective against the defense algorithm and is also transferable across CNNs.
arXiv Detail & Related papers (2020-10-29T17:16:59Z) - HpRNet : Incorporating Residual Noise Modeling for Violin in a
Variational Parametric Synthesizer [11.4219428942199]
We introduce a dataset of Carnatic Violin Recordings where bow noise is an integral part of the playing style of higher pitched notes.
We obtain insights about each of the harmonic and residual components of the signal, as well as their interdependence.
arXiv Detail & Related papers (2020-08-19T12:48:32Z) - Neural Granular Sound Synthesis [53.828476137089325]
Granular sound synthesis is a popular audio generation technique based on rearranging sequences of small waveform windows.
We show that generative neural networks can implement granular synthesis while alleviating most of its shortcomings.
arXiv Detail & Related papers (2020-08-04T08:08:00Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.