Related papers: Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

URL: http://arxiv.org/abs/2308.16540v1
Date: Thu, 31 Aug 2023 08:30:20 GMT
Title: Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals
Authors: Dhananjaya Gowda, Sudarsana Reddy Kadiri, Brad Story, Paavo Alku
Abstract summary: We propose a new method for the accurate estimation and tracking of formants in speech signals. TVQCP analysis combines three approaches to improve formant estimation and tracking. The proposed TVQCP method performs better than conventional and popular formant tracking tools.
Score: 17.69029813982043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage estimate-and-track strategy wherein an initial set of formant candidates are estimated using short-time analysis (e.g., 10--50 ms), followed by a tracking stage based on dynamic programming or a linear state-space model. One of the main disadvantages of these approaches is that the tracking stage, however good it may be, cannot improve upon the formant estimation accuracy of the first stage. The proposed TVQCP method provides a single-stage formant tracking that combines the estimation and tracking stages into one. TVQCP analysis combines three approaches to improve formant estimation and tracking: (1) it uses temporally weighted quasi-closed-phase analysis to derive closed-phase estimates of the vocal tract with reduced interference from the excitation source, (2) it increases the residual sparsity by using the $L_1$ optimization and (3) it uses time-varying linear prediction analysis over long time windows (e.g., 100--200 ms) to impose a continuity constraint on the vocal tract model and hence on the formant trajectories. Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner). Matlab scripts for the proposed method can be found at: https://github.com/njaygowda/ftrack

Related papers

DELTAv2: Accelerating Dense 3D Tracking [79.63990337419514]
We propose a novel algorithm for accelerating dense long-term 3D point tracking in videos.<n>We introduce a coarse-to-fine strategy that begins tracking with a small subset of points and progressively expands the set of tracked trajectories.<n>The newly added trajectories are using a learnable module, which is trained end-to-end alongside the tracking network.
arXiv Detail & Related papers (2025-08-02T03:15:47Z)
From Target Tracking to Targeting Track -- Part III: Stochastic Process Modeling and Online Learning [18.8192435654239]
This study describes the target trajectory as a sample path of a process (SP) By adopting a deterministic-stochastic decomposition framework, we decompose the learning of the trajectory SP into two sequential stages. This leads to a Markov-free data-driven tracking approach that produces the continuous-time trajectory with minimal prior knowledge of the target dynamics.
arXiv Detail & Related papers (2025-03-03T12:04:38Z)
Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review [59.856222854472605]
This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. practical applications in fields such as biology often require sample generation that maximizes specific metrics. We discuss (1) fine-tuning methods combined with inference-time techniques, (2) inference-time algorithms based on search algorithms such as Monte Carlo tree search, and (3) connections between inference-time algorithms in language models and diffusion models.
arXiv Detail & Related papers (2025-01-16T17:37:35Z)
ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking [41.889032460337226]
ProTracker is a novel framework for accurate and robust long-term dense tracking of arbitrary points in videos. This design effectively combines global semantic information with temporally aware low-level features. Experiments demonstrate that ProTracker attains state-of-the-art performance among optimization-based approaches.
arXiv Detail & Related papers (2025-01-06T18:55:52Z)
Dense Optical Tracking: Connecting the Dots [82.79642869586587]
DOT is a novel, simple and efficient method for solving the problem of point tracking in a video. We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal trackers" like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker.
arXiv Detail & Related papers (2023-12-01T18:59:59Z)
Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments. Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z)
Refining a Deep Learning-based Formant Tracker using Linear Prediction Methods [19.88212227822267]
Two refined DeepFormants trackers were compared with the original DeepFormants and with five known traditional trackers. The results indicated that the data-driven DeepFormants trackers outperformed the conventional trackers and that the best performance was obtained by refining the formants predicted by DeepFormants using QCP-FB analysis.
arXiv Detail & Related papers (2023-08-17T15:32:32Z)
Formant Tracking Using Quasi-Closed Phase Forward-Backward Linear Prediction Analysis and Deep Neural Networks [48.98397553726019]
Formant tracking is investigated by using trackers based on dynamic programming (DP) and deep neural nets (DNNs) The six methods include linear prediction (LP) algorithms, weighted LP algorithms and the recently developed quasi-closed phase forward-backward (QCP-FB) method. A novel formant tracking approach, which combines benefits of deep learning and signal processing based on QCP-FB, was proposed.
arXiv Detail & Related papers (2022-01-05T10:27:07Z)
SoundDet: Polyphonic Sound Event Detection and Localization from Raw Waveform [48.68714598985078]
SoundDet is an end-to-end trainable and light-weight framework for polyphonic moving sound event detection and localization. SoundDet directly consumes the raw, multichannel waveform and treats the temporal sound event as a complete sound-object" to be detected. A dense sound proposal event map is then constructed to handle the challenges of predicting events with large varying temporal duration.
arXiv Detail & Related papers (2021-06-13T11:43:41Z)
Uncertainty-Aware Signal Temporal logic [21.626420725274208]
Existing temporal logic inference methods mostly neglect uncertainties in the data. We propose two uncertainty-aware signal temporal logic (STL) inference approaches to classify the undesired behaviors and desired behaviors of a system.
arXiv Detail & Related papers (2021-05-24T21:26:57Z)
On projection methods for functional time series forecasting [0.0]
Two nonparametric methods are presented for forecasting functional time series (FTS) We address both one-step-ahead forecasting and dynamic updating. The methods are applied to simulated data, daily electricity demand, and NOx emissions.
arXiv Detail & Related papers (2021-05-10T14:24:38Z)
FlowMOT: 3D Multi-Object Tracking by Scene Flow Association [9.480272707157747]
We propose a LiDAR-based 3D MOT framework named FlowMOT, which integrates point-wise motion information with the traditional matching algorithm. Our approach outperforms recent end-to-end methods and achieves competitive performance with the state-of-the-art filter-based method.
arXiv Detail & Related papers (2020-12-14T14:03:48Z)
Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence. We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z)
Learning to Optimize Non-Rigid Tracking [54.94145312763044]
We employ learnable optimizations to improve robustness and speed up solver convergence. First, we upgrade the tracking objective by integrating an alignment data term on deep features which are learned end-to-end through CNN. Second, we bridge the gap between the preconditioning technique and learning method by introducing a ConditionNet which is trained to generate a preconditioner.
arXiv Detail & Related papers (2020-03-27T04:40:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.