Related papers: Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

URL: http://arxiv.org/abs/2408.15616v1
Date: Wed, 28 Aug 2024 08:14:51 GMT
Title: Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications
Authors: Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann,
Abstract summary: The Word Error Rate (WER) is the common measure of accuracy for Automatic Speech Recognition (ASR) We present a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER. We also provide an exemplary analysis of derived use cases, such as a punctuation error rate, and a web application for interactive use and visualisation of our implementation.
Score: 5.266869303483375
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Word Error Rate (WER) is the common measure of accuracy for Automatic Speech Recognition (ASR). Transcripts are usually pre-processed by substituting specific characters to account for non-semantic differences. As a result of this normalisation, information on the accuracy of punctuation or capitalisation is lost. We present a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER and additional orthographic metrics. Transcription errors are also classified more granularly by existing string similarity and phonetic algorithms. An evaluation on several datasets demonstrates the practical equivalence of our approach compared to common WER computations. We also provide an exemplary analysis of derived use cases, such as a punctuation error rate, and a web application for interactive use and visualisation of our implementation. The code is available open-source.

Related papers

Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval [18.333752341467083]
The biasing mechanism is typically based on a cross-attention module between the audio and a catalogue of biasing entries. This work proposes an approximation to cross-attention scoring based on vector quantization. We show that retrieval based shortlisting allows the system to efficiently leverage biasing catalogues of several thousands of entries.
arXiv Detail & Related papers (2024-11-01T15:28:03Z)
Using Similarity to Evaluate Factual Consistency in Summaries [2.7595794227140056]
Abstractive summarisers generate fluent summaries, but the factuality of the generated text is not guaranteed. We propose a new zero-shot factuality evaluation metric, Sentence-BERTScore (SBERTScore), which compares sentences between the summary and the source document. Our experiments indicate that each technique has different strengths, with SBERTScore particularly effective in identifying correct summaries.
arXiv Detail & Related papers (2024-09-23T15:02:38Z)
SparseCL: Sparse Contrastive Learning for Contradiction Retrieval [87.02936971689817]
Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query. Existing methods such as similarity search and crossencoder models exhibit significant limitations. We introduce SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences.
arXiv Detail & Related papers (2024-06-15T21:57:03Z)
Understanding and Mitigating Classification Errors Through Interpretable Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors. We propose to discover those patterns of tokens that distinguish correct and erroneous predictions. We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z)
Optimized Tokenization for Transcribed Error Correction [10.297878672883973]
We show that the performance of correction models can be significantly increased by training solely using synthetic data. Specifically, we show that synthetic data generated using the error distribution derived from a set of transcribed data outperforms the common approach of applying random perturbations.
arXiv Detail & Related papers (2023-10-16T12:14:21Z)
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm [45.42075576656938]
Contextual biasing refers to the problem of biasing automatic speech recognition systems towards rare entities. We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching.
arXiv Detail & Related papers (2023-09-29T22:50:10Z)
End-to-End Page-Level Assessment of Handwritten Text Recognition [69.55992406968495]
HTR systems increasingly face the end-to-end page-level transcription of a document. Standard metrics do not take into account the inconsistencies that might appear. We propose a two-fold evaluation, where the transcription accuracy and the RO goodness are considered separately.
arXiv Detail & Related papers (2023-01-14T15:43:07Z)
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels. Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z)
Extracting Grammars from a Neural Network Parser for Anomaly Detection in Unknown Formats [79.6676793507792]
Reinforcement learning has recently shown promise as a technique for training an artificial neural network to parse sentences in some unknown format. This paper presents procedures for extracting production rules from the neural network, and for using these rules to determine whether a given sentence is nominal or anomalous.
arXiv Detail & Related papers (2021-07-30T23:10:24Z)
Automatic Vocabulary and Graph Verification for Accurate Loop Closure Detection [21.862978912891677]
Bag-of-Words (BoW) builds a visual vocabulary to associate features and then detect loops. We propose a natural convergence criterion based on the comparison between the radii of nodes and the drifts of feature descriptors. We present a novel topological graph verification method for validating candidate loops.
arXiv Detail & Related papers (2021-07-30T13:19:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.