Related papers: wav2shape: Hearing the Shape of a Drum Machine

wav2shape: Hearing the Shape of a Drum Machine

URL: http://arxiv.org/abs/2007.10299v1
Date: Mon, 20 Jul 2020 17:35:24 GMT
Title: wav2shape: Hearing the Shape of a Drum Machine
Authors: Han Han and Vincent Lostanlen
Abstract summary: Disentangling and recovering physical attributes from a waveform is a challenging inverse problem in audio signal processing. We propose to address this problem via a combination of time--frequency analysis and supervised machine learning.
Score: 4.283530753133897
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Disentangling and recovering physical attributes, such as shape and material, from a few waveform examples is a challenging inverse problem in audio signal processing, with numerous applications in musical acoustics as well as structural engineering. We propose to address this problem via a combination of time--frequency analysis and supervised machine learning. We start by synthesizing a dataset of sounds using the functional transformation method. Then, we represent each percussive sound in terms of its time-invariant scattering transform coefficients and formulate the parametric estimation of the resonator as multidimensional regression with a deep convolutional neural network. We interpolate scattering coefficients over the surface of the drum as a surrogate for potentially missing data, and study the response of the neural network to interpolated samples. Lastly, we resynthesize drum sounds from scattering coefficients, therefore paving the way towards a deep generative model of drum sounds whose latent variables are physically interpretable.

Related papers

Learning spectral density functions in open quantum systems [0.0]
We use exactly solvable spin-boson models with pure-dephasing and amplitude-damping channels to reconstruct spectral density functions from noisy data.<n>Our neural network robustly reconstructs structured densities by filtering noisy signals and learning general functional dependencies.
arXiv Detail & Related papers (2026-02-27T14:45:59Z)
On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking [49.1352577985191]
We present a comprehensive analysis of how two-layer neural networks learn features to solve the modular addition task.<n>Our work provides a full mechanistic interpretation of the learned model and a theoretical explanation of its training dynamics.
arXiv Detail & Related papers (2026-02-18T20:25:13Z)
SCRAPL: Scattering Transform with Random Paths for Machine Learning [19.198253857377054]
"Scattering transform with Random Paths for machine Learning" (SCRAPL) is a scheme for efficient evaluation of multivariable scattering transforms.<n>SCRAPL demodulates spectrotemporal patterns at multiple scales and rates, allowing a fine characterization of intermittent auditory textures.<n>We apply SCRAPL to differentiable digital signal processing (DDSP), specifically, unsupervised sound matching of a granular synthesizer and the Roland TR-808 drum machine.
arXiv Detail & Related papers (2026-02-11T18:57:08Z)
Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning [52.26396748560348]
We provide an overview of high dimensional dynamical systems driven by random matrices.<n>We focus on applications to simple models of learning and generalization in machine learning theory.
arXiv Detail & Related papers (2026-01-03T00:12:32Z)
Additive decomposition of one-dimensional signals using Transformers [48.7025991956527]
One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields.<n>Recent research suggests that applying the latest deep learning models to this problem presents an exciting, unexplored area with promising potential.<n>We leverage the Transformer architecture to decompose signals into their constituent components.
arXiv Detail & Related papers (2025-06-06T10:09:40Z)
Conditional score-based diffusion models for solving inverse problems in mechanics [6.319616423658121]
We propose a framework to perform Bayesian inference using conditional score-based diffusion models. Conditional score-based diffusion models are generative models that learn to approximate the score function of a conditional distribution. We demonstrate the efficacy of the proposed approach on a suite of high-dimensional inverse problems in mechanics.
arXiv Detail & Related papers (2024-06-19T02:09:15Z)
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion [85.54515118077825]
This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality. To reduce computational complexity, LinDiff employs a patch-based processing approach that partitions the input signal into small patches. Our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed.
arXiv Detail & Related papers (2023-06-09T07:02:43Z)
Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data. In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z)
An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms [2.3204178451683264]
In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument compression. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch.
arXiv Detail & Related papers (2023-01-18T17:19:04Z)
Deep learning for full-field ultrasonic characterization [7.120879473925905]
This study takes advantage of recent advances in machine learning to establish a physics-based data analytic platform. Two logics, namely the direct inversion and physics-informed neural networks (PINNs), are explored.
arXiv Detail & Related papers (2023-01-06T05:01:05Z)
Multimodal Exponentially Modified Gaussian Oscillators [4.233733499457509]
This study presents a three-stage Multimodal Exponentially Modified Gaussian (MEMG) model with an optional oscillating term. With this, synthetic ultrasound signals suffering from artifacts can be fully recovered. Real data experimentation is carried out to demonstrate the classification capability of the acquired features.
arXiv Detail & Related papers (2022-09-25T11:48:09Z)
A deep learning driven pseudospectral PCE based FFT homogenization algorithm for complex microstructures [68.8204255655161]
It is shown that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches. It is shown, that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
arXiv Detail & Related papers (2021-10-26T07:02:14Z)
WaveTransform: Crafting Adversarial Examples via Input Decomposition [69.01794414018603]
We introduce WaveTransform', that creates adversarial noise corresponding to low-frequency and high-frequency subbands, separately (or in combination) Experiments show that the proposed attack is effective against the defense algorithm and is also transferable across CNNs.
arXiv Detail & Related papers (2020-10-29T17:16:59Z)
HpRNet : Incorporating Residual Noise Modeling for Violin in a Variational Parametric Synthesizer [11.4219428942199]
We introduce a dataset of Carnatic Violin Recordings where bow noise is an integral part of the playing style of higher pitched notes. We obtain insights about each of the harmonic and residual components of the signal, as well as their interdependence.
arXiv Detail & Related papers (2020-08-19T12:48:32Z)
Neural Granular Sound Synthesis [53.828476137089325]
Granular sound synthesis is a popular audio generation technique based on rearranging sequences of small waveform windows. We show that generative neural networks can implement granular synthesis while alleviating most of its shortcomings.
arXiv Detail & Related papers (2020-08-04T08:08:00Z)
VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation. We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.