Modeling Dependent Structure for Utterances in ASR Evaluation
- URL: http://arxiv.org/abs/2209.05281v1
- Date: Wed, 7 Sep 2022 21:51:06 GMT
- Title: Modeling Dependent Structure for Utterances in ASR Evaluation
- Authors: Zhe Liu and Fuchun Peng
- Abstract summary: bootstrap resampling has been popular for performing significance analysis on word error rate (WER) in automatic speech recognition (ASR) evaluations.
blockwise bootstrap approach is also proposed that by dividing utterances into uncorrelated blocks, it resamples these blocks instead of original data.
We show that the resulting variance estimator for WER is consistent under mild conditions.
- Score: 16.559092192445917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The bootstrap resampling method has been popular for performing significance
analysis on word error rate (WER) in automatic speech recognition (ASR)
evaluations. To deal with the issue of dependent speech data, the blockwise
bootstrap approach is also proposed that by dividing utterances into
uncorrelated blocks, it resamples these blocks instead of original data.
However, it is always nontrivial to uncover the dependent structure among
utterances, which could lead to subjective findings in statistical testing. In
this paper, we present graphical lasso based methods to explicitly model such
dependency and estimate the independent blocks of utterances in a rigorous way.
Then the blockwise bootstrap is applied on top of the inferred blocks. We show
that the resulting variance estimator for WER is consistent under mild
conditions. We also demonstrate the validity of proposed approach on
LibriSpeech data.
Related papers
- Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.
The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.
The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - Unlearning-based Neural Interpretations [51.99182464831169]
We show that current baselines defined using static functions are biased, fragile and manipulable.
We propose UNI to compute an (un)learnable, debiased and adaptive baseline by perturbing the input towards an unlearning direction of steepest ascent.
arXiv Detail & Related papers (2024-10-10T16:02:39Z) - Semi-supervised Learning For Robust Speech Evaluation [30.593420641501968]
Speech evaluation measures a learners oral proficiency using automatic models.
This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization.
An anchor model is trained using pseudo labels to predict the correctness of pronunciation.
arXiv Detail & Related papers (2024-09-23T02:11:24Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - BASS: Block-wise Adaptation for Speech Summarization [47.518484305407185]
We develop a method that allows one to train summarization models on very long sequences in an incremental manner.
Speech summarization is realized as a streaming process, where hypothesis summaries are updated every block.
Experiments on the How2 dataset demonstrate that the proposed block-wise training method improves by 3 points absolute on ROUGE-L over a truncated input baseline.
arXiv Detail & Related papers (2023-07-17T03:31:36Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - Zero-Shot Automatic Pronunciation Assessment [19.971348810774046]
We propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT.
Experimental results on speechocean762 demonstrate that the proposed method achieves comparable performance to supervised regression baselines.
arXiv Detail & Related papers (2023-05-31T05:17:17Z) - Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment.
We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z) - AB/BA analysis: A framework for estimating keyword spotting recall
improvement while maintaining audio privacy [0.0]
KWS is designed to only collect data when the keyword is present, limiting the availability of hard samples that may contain false negatives.
We propose an evaluation technique which we call AB/BA analysis.
We show that AB/BA analysis is successful at measuring recall improvement in conjunction with the trade-off in relative false positive rate.
arXiv Detail & Related papers (2022-04-18T13:52:22Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - TSInsight: A local-global attribution framework for interpretability in
time-series data [5.174367472975529]
We propose an auto-encoder to the classifier with a sparsity-inducing norm on its output and fine-tune it based on the gradients from the classifier and a reconstruction penalty.
TSInsight learns to preserve features that are important for prediction by the classifier and suppresses those that are irrelevant.
In contrast to most other attribution frameworks, TSInsight is capable of generating both instance-based and model-based explanations.
arXiv Detail & Related papers (2020-04-06T19:34:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.