Unsupervised Classification of Voiced Speech and Pitch Tracking Using
Forward-Backward Kalman Filtering
- URL: http://arxiv.org/abs/2103.01173v1
- Date: Mon, 1 Mar 2021 18:13:23 GMT
- Title: Unsupervised Classification of Voiced Speech and Pitch Tracking Using
Forward-Backward Kalman Filtering
- Authors: Benedikt Boenninghoff, Robert M. Nickel, Steffen Zeiler, Dorothea
Kolossa
- Abstract summary: We present a new algorithm that integrates the three subtasks into a single procedure.
The algorithm can be applied to pre-recorded speech utterances in the presence of considerable amounts of background noise.
- Score: 14.950964357181524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The detection of voiced speech, the estimation of the fundamental frequency,
and the tracking of pitch values over time are crucial subtasks for a variety
of speech processing techniques. Many different algorithms have been developed
for each of the three subtasks. We present a new algorithm that integrates the
three subtasks into a single procedure. The algorithm can be applied to
pre-recorded speech utterances in the presence of considerable amounts of
background noise. We combine a collection of standard metrics, such as the
zero-crossing rate, for example, to formulate an unsupervised voicing
classifier. The estimation of pitch values is accomplished with a hybrid
autocorrelation-based technique. We propose a forward-backward Kalman filter to
smooth the estimated pitch contour. In experiments, we are able to show that
the proposed method compares favorably with current, state-of-the-art pitch
detection algorithms.
Related papers
- Robust detection of overlapping bioacoustic sound events [16.976684123806653]
We introduce an onset-based detection method which we name Voxaboxen.
For each time window, Voxaboxen predicts whether it contains the start of a vocalization and how long the vocalization is.
We release a new dataset designed to measure performance on detecting overlapping vocalizations.
arXiv Detail & Related papers (2025-03-04T08:26:03Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - A New Adaptive Noise Covariance Matrices Estimation and Filtering
Method: Application to Multi-Object Tracking [6.571006663689735]
Kalman filters are widely used for object tracking, where process and measurement noise are usually considered accurately known and constant.
This paper proposes a new estimation-correction closed-loop estimation method to estimate the Kalman filter process and measurement noise covariance matrices online.
arXiv Detail & Related papers (2021-12-20T03:11:48Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation.
We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities.
Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z) - Exploiting Attention-based Sequence-to-Sequence Architectures for Sound
Event Localization [113.19483349876668]
This paper proposes a novel approach to sound event localization by utilizing an attention-based sequence-to-sequence model.
It yields superior localization performance compared to state-of-the-art methods in both anechoic and reverberant conditions.
arXiv Detail & Related papers (2021-02-28T07:52:20Z) - A Systematic Characterization of Sampling Algorithms for Open-ended
Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models.
We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation.
We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z) - Evaluating the reliability of acoustic speech embeddings [10.5754802112615]
Speech embeddings are fixed-size acoustic representations of variable-length speech sequences.
Here, we systematically compare two popular metrics, ABX discrimination and Mean Average Precision (MAP), on 5 languages across 17 embedding methods.
We find that overall, ABX and MAP correlate with one another and with frequency estimation.
arXiv Detail & Related papers (2020-07-27T13:24:09Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.