HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation
- URL: http://arxiv.org/abs/2205.01019v1
- Date: Mon, 2 May 2022 16:45:20 GMT
- Title: HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation
- Authors: Weixing Wei, Peilin Li, Yi Yu, Wei Li
- Abstract summary: This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently.
We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation.
The results show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultaneously reduces more than 90% parameters.
- Score: 7.5089093564620155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sounds, especially music, contain various harmonic components scattered in
the frequency dimension. It is difficult for normal convolutional neural
networks to observe these overtones. This paper introduces a multiple rates
dilated causal convolution (MRDC-Conv) method to capture the harmonic structure
in logarithmic scale spectrograms efficiently. The harmonic is helpful for
pitch estimation, which is important for many sound processing applications. We
propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and
other dilated convolutions in pitch estimation. The results show that this
model outperforms the DeepF0, yields state-of-the-art performance in three
datasets, and simultaneously reduces more than 90% parameters. We also find
that it has stronger noise resistance and fewer octave errors.
Related papers
- Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features [10.480691005356967]
We propose a unified framework that blindly estimates reverberation time (T60), direct-to-reverberant ratio (DRR) and clarity (C50) across 10 frequency bands.
The proposed framework utilizes a novel feature named Spectro-Spatial Co Vector (SSCV), efficiently representing temporal, spectral as well as spatial information of the FOA signal.
arXiv Detail & Related papers (2024-11-05T15:20:23Z) - Sine, Transient, Noise Neural Modeling of Piano Notes [0.0]
Three sub-modules learn components from piano recordings and generate harmonic, transient, and noise signals.
From singular notes, we emulate the coupling between different keys in trichords with a convolutional-based network.
Results show the model matches the partial distribution of the target while predicting the energy in the higher part of the spectrum presents more challenges.
arXiv Detail & Related papers (2024-09-10T13:48:18Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Learning Frequency Domain Approximation for Binary Neural Networks [68.79904499480025]
We propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs.
The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy.
arXiv Detail & Related papers (2021-03-01T08:25:26Z) - DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech
Signals [11.939409227407769]
We propose a novel pitch estimation technique called DeepF0.
It leverages the available annotated data to directly learn from the raw audio in a data-driven manner.
arXiv Detail & Related papers (2021-02-11T23:11:22Z) - Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z) - Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural
Networks [7.088324036549911]
This paper addresses the extraction of multiple F0 values from polyphonic and a cappella vocal performances using convolutional neural networks (CNNs)
We build upon an existing architecture to produce a pitch salience function of the input signal.
For training, we build a dataset that comprises several multi-track datasets of vocal quartets with F0 annotations.
arXiv Detail & Related papers (2020-09-09T09:11:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.