Fast Word Error Rate Estimation Using Self-Supervised Representations
For Speech And Text
- URL: http://arxiv.org/abs/2310.08225v1
- Date: Thu, 12 Oct 2023 11:17:40 GMT
- Title: Fast Word Error Rate Estimation Using Self-Supervised Representations
For Speech And Text
- Authors: Chanho Park, Chengsong Lu, Mingjie Chen, Thomas Hain
- Abstract summary: The quality of automatic speech recognition (ASR) is typically measured by word error rate (WER)
WER estimation is a task aiming to predict the WER of an ASR system, given a speech utterance and a transcription.
This paper introduces a Fast WER estimator (Fe-WER) using self-supervised learning representation (SSLR)
- Score: 23.25173244408922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The quality of automatic speech recognition (ASR) is typically measured by
word error rate (WER). WER estimation is a task aiming to predict the WER of an
ASR system, given a speech utterance and a transcription. This task has gained
increasing attention while advanced ASR systems are trained on large amounts of
data. In this case, WER estimation becomes necessary in many scenarios, for
example, selecting training data with unknown transcription quality or
estimating the testing performance of an ASR system without ground truth
transcriptions. Facing large amounts of data, the computation efficiency of a
WER estimator becomes essential in practical applications. However, previous
works usually did not consider it as a priority. In this paper, a Fast WER
estimator (Fe-WER) using self-supervised learning representation (SSLR) is
introduced. The estimator is built upon SSLR aggregated by average pooling. The
results show that Fe-WER outperformed the e-WER3 baseline relatively by 19.69%
and 7.16% on Ted-Lium3 in both evaluation metrics of root mean square error and
Pearson correlation coefficient, respectively. Moreover, the estimation
weighted by duration was 10.43% when the target was 10.88%. Lastly, the
inference speed was about 4x in terms of a real-time factor.
Related papers
- Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models [69.38024658668887]
Current evaluation method for event extraction relies on token-level exact match.
We propose RAEE, an automatic evaluation framework that accurately assesses event extraction results at semantic-level instead of token-level.
arXiv Detail & Related papers (2024-10-12T07:54:01Z) - Semi-supervised Learning For Robust Speech Evaluation [30.593420641501968]
Speech evaluation measures a learners oral proficiency using automatic models.
This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization.
An anchor model is trained using pseudo labels to predict the correctness of pronunciation.
arXiv Detail & Related papers (2024-09-23T02:11:24Z) - Automatic Speech Recognition System-Independent Word Error Rate Estimation [23.25173244408922]
Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems.
In this paper, a hypothesis generation method for ASR System-Independent WER estimation is proposed.
arXiv Detail & Related papers (2024-04-25T16:57:05Z) - TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in
End-to-End ASR [1.8477401359673709]
Class-probability-based confidence scores do not accurately represent quality of overconfident ASR predictions.
We propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train Confidence Estimation Model (CEM)
We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes.
arXiv Detail & Related papers (2024-01-06T16:29:13Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - Parameter-Efficient Learning for Text-to-Speech Accent Adaptation [58.356667204518985]
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS)
A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters.
Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning.
arXiv Detail & Related papers (2023-05-18T22:02:59Z) - A Meta-Learning Approach to Predicting Performance and Data Requirements [163.4412093478316]
We propose an approach to estimate the number of samples required for a model to reach a target performance.
We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset.
We introduce a novel piecewise power law (PPL) that handles the two data differently.
arXiv Detail & Related papers (2023-03-02T21:48:22Z) - Accelerating Attention through Gradient-Based Learned Runtime Pruning [9.109136535767478]
Self-attention is a key enabler of state-of-art accuracy for transformer-based Natural Language Processing models.
This paper formulates its search through a soft differentiable regularizer integrated into the loss function of the training.
We devise a bit-serial architecture, dubbed LeOPArd, for transformer language models with bit-level early termination microarchitectural mechanism.
arXiv Detail & Related papers (2022-04-07T05:31:13Z) - Self-supervised Representation Learning with Relative Predictive Coding [102.93854542031396]
Relative Predictive Coding (RPC) is a new contrastive representation learning objective.
RPC maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance.
We empirically verify the effectiveness of RPC on benchmark vision and speech self-supervised learning tasks.
arXiv Detail & Related papers (2021-03-21T01:04:24Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.