Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for
End Usability
- URL: http://arxiv.org/abs/2106.02016v1
- Date: Thu, 3 Jun 2021 17:35:14 GMT
- Title: Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for
End Usability
- Authors: Somnath Roy
- Abstract summary: State-of-the-art systems have achieved a word error rate (WER) less than 5%.
Semantic-WER (SWER) is a metric to evaluate the ASR transcripts for downstream applications in general.
- Score: 1.599072005190786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in supervised, semi-supervised and self-supervised deep
learning algorithms have shown significant improvement in the performance of
automatic speech recognition(ASR) systems. The state-of-the-art systems have
achieved a word error rate (WER) less than 5%. However, in the past,
researchers have argued the non-suitability of the WER metric for the
evaluation of ASR systems for downstream tasks such as spoken language
understanding (SLU) and information retrieval. The reason is that the WER works
at the surface level and does not include any syntactic and semantic
knowledge.The current work proposes Semantic-WER (SWER), a metric to evaluate
the ASR transcripts for downstream applications in general. The SWER can be
easily customized for any down-stream task.
Related papers
- Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques [17.166092544686553]
This study benchmarks Speech Emotion Recognition using ASR transcripts with varying Word Error Rates (WERs) from eleven models on three well-known corpora.
We propose a unified ASR error-robust framework integrating ASR error correction and modality-gated fusion, achieving lower WER and higher SER results compared to the best-performing ASR transcript.
arXiv Detail & Related papers (2024-06-12T15:59:25Z) - Automatic Speech Recognition System-Independent Word Error Rate Estimation [23.25173244408922]
Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems.
In this paper, a hypothesis generation method for ASR System-Independent WER estimation is proposed.
arXiv Detail & Related papers (2024-04-25T16:57:05Z) - Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to
First Graders [18.849741353784328]
In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read.
We used children's speech from an existing corpus (JASMIN) to develop two new ASR systems, and compared the results to those of the previous study.
The accuracy of the ASR systems varies for different reading tasks and word types.
arXiv Detail & Related papers (2023-06-07T06:58:38Z) - Improving Textless Spoken Language Understanding with Discrete Units as
Intermediate Target [58.59044226658916]
Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances.
We propose to use discrete units as intermediate guidance to improve textless SLU performance.
arXiv Detail & Related papers (2023-05-29T14:00:24Z) - Contextual-Utterance Training for Automatic Speech Recognition [65.4571135368178]
We propose a contextual-utterance training technique which makes use of the previous and future contextual utterances.
Also, we propose a dual-mode contextual-utterance training technique for streaming automatic speech recognition (ASR) systems.
The proposed technique is able to reduce both the WER and the average last token emission latency by more than 6% and 40ms relative.
arXiv Detail & Related papers (2022-10-27T08:10:44Z) - Semantic Distance: A New Metric for ASR Performance Analysis Towards
Spoken Language Understanding [26.958001571944678]
We propose a novel Semantic Distance (SemDist) measure as an alternative evaluation metric for ASR systems.
We demonstrate the effectiveness of our proposed metric on various downstream tasks, including intent recognition, semantic parsing, and named entity recognition.
arXiv Detail & Related papers (2021-04-05T20:25:07Z) - Knowledge Distillation for Improved Accuracy in Spoken Question
Answering [63.72278693825945]
We devise a training strategy to perform knowledge distillation from spoken documents and written counterparts.
Our work makes a step towards distilling knowledge from the language model as a supervision signal.
Experiments demonstrate that our approach outperforms several state-of-the-art language models on the Spoken-SQuAD dataset.
arXiv Detail & Related papers (2020-10-21T15:18:01Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.