Learning Sparse Analytic Filters for Piano Transcription
- URL: http://arxiv.org/abs/2108.10382v1
- Date: Mon, 23 Aug 2021 19:41:11 GMT
- Title: Learning Sparse Analytic Filters for Piano Transcription
- Authors: Frank Cwitkowitz, Mojtaba Heydari and Zhiyao Duan
- Abstract summary: Filterbank learning has become an increasingly popular strategy for various audio-related machine learning tasks.
In this work, several variations of a filterbank learning module are investigated for piano transcription.
- Score: 21.352141245632247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, filterbank learning has become an increasingly popular
strategy for various audio-related machine learning tasks. This is partly due
to its ability to discover task-specific audio characteristics which can be
leveraged in downstream processing. It is also a natural extension of the
nearly ubiquitous deep learning methods employed to tackle a diverse array of
audio applications. In this work, several variations of a frontend filterbank
learning module are investigated for piano transcription, a challenging
low-level music information retrieval task. We build upon a standard piano
transcription model, modifying only the feature extraction stage. The
filterbank module is designed such that its complex filters are unconstrained
1D convolutional kernels with long receptive fields. Additional variations
employ the Hilbert transform to render the filters intrinsically analytic and
apply variational dropout to promote filterbank sparsity. Transcription results
are compared across all experiments, and we offer visualization and analysis of
the filterbanks.
Related papers
- Differentiable All-pole Filters for Time-varying Audio Systems [9.089836388818808]
We re-express a time-varying all-pole filter to backpropagate the gradient through itself.
This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation.
We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and compressor.
arXiv Detail & Related papers (2024-04-11T17:55:05Z) - Filter Pruning for Efficient CNNs via Knowledge-driven Differential
Filter Sampler [103.97487121678276]
Filter pruning simultaneously accelerates the computation and reduces the memory overhead of CNNs.
We propose a novel Knowledge-driven Differential Filter Sampler(KDFS) with Masked Filter Modeling(MFM) framework for filter pruning.
arXiv Detail & Related papers (2023-07-01T02:28:41Z) - Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging.
We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z) - Filter-enhanced MLP is All You Need for Sequential Recommendation [89.0974365344997]
In online platforms, logged user behavior data is inevitable to contain noise.
We borrow the idea of filtering algorithms from signal processing that attenuates the noise in the frequency domain.
We propose textbfFMLP-Rec, an all-MLP model with learnable filters for sequential recommendation task.
arXiv Detail & Related papers (2022-02-28T05:49:35Z) - Learning Filterbanks for End-to-End Acoustic Beamforming [8.721077261941234]
Recent work on monaural source separation has shown that performance can be increased by using fully learned filterbanks with short windows.
On the other hand, for conventional beamforming techniques, performance increases with long analysis windows.
In this work we try to bridge the gap between these two worlds and explore fully end-to-end hybrid neural beamforming.
arXiv Detail & Related papers (2021-11-08T16:36:34Z) - Direct design of biquad filter cascades with deep learning by sampling
random polynomials [5.1118282767275005]
In this work, we learn a direct mapping from the target magnitude response to the filter coefficient space with a neural network trained on millions of random filters.
We demonstrate our approach enables both fast and accurate estimation of filter coefficients given a desired response.
We compare our method against existing methods including modified Yule-Walker and gradient descent and show IIRNet is, on average, both faster and more accurate.
arXiv Detail & Related papers (2021-10-07T17:58:08Z) - Learning Versatile Convolution Filters for Efficient Visual Recognition [125.34595948003745]
This paper introduces versatile filters to construct efficient convolutional neural networks.
We conduct theoretical analysis on network complexity and an efficient convolution scheme is introduced.
Experimental results on benchmark datasets and neural networks demonstrate that our versatile filters are able to achieve comparable accuracy as that of original filters.
arXiv Detail & Related papers (2021-09-20T06:07:14Z) - Unsharp Mask Guided Filtering [53.14430987860308]
The goal of this paper is guided image filtering, which emphasizes the importance of structure transfer during filtering.
We propose a new and simplified formulation of the guided filter inspired by unsharp masking.
Our formulation enjoys a filtering prior to a low-pass filter and enables explicit structure transfer by estimating a single coefficient.
arXiv Detail & Related papers (2021-06-02T19:15:34Z) - Training Interpretable Convolutional Neural Networks by Differentiating
Class-specific Filters [64.46270549587004]
Convolutional neural networks (CNNs) have been successfully used in a range of tasks.
CNNs are often viewed as "black-box" and lack of interpretability.
We propose a novel strategy to train interpretable CNNs by encouraging class-specific filters.
arXiv Detail & Related papers (2020-07-16T09:12:26Z) - Exploring Filterbank Learning for Keyword Spotting [27.319236923928205]
This paper explores filterbank learning for keyword spotting (KWS)
Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank.
Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features.
arXiv Detail & Related papers (2020-05-30T08:11:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.