LibriSpeech-PC: Benchmark for Evaluation of Punctuation and
Capitalization Capabilities of end-to-end ASR Models
- URL: http://arxiv.org/abs/2310.02943v1
- Date: Wed, 4 Oct 2023 16:23:37 GMT
- Title: LibriSpeech-PC: Benchmark for Evaluation of Punctuation and
Capitalization Capabilities of end-to-end ASR Models
- Authors: Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina,
Vitaly Lavrukhin, Boris Ginsburg
- Abstract summary: We introduce a LibriSpeech-PC benchmark designed to assess the punctuation and capitalization prediction capabilities of end-to-end ASR models.
The benchmark includes a LibriSpeech-PC dataset with restored punctuation and capitalization, a novel evaluation metric called Punctuation Error Rate (PER) that focuses on punctuation marks, and initial baseline models.
- Score: 58.790604613878216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional automatic speech recognition (ASR) models output lower-cased
words without punctuation marks, which reduces readability and necessitates a
subsequent text processing model to convert ASR transcripts into a proper
format. Simultaneously, the development of end-to-end ASR models capable of
predicting punctuation and capitalization presents several challenges,
primarily due to limited data availability and shortcomings in the existing
evaluation methods, such as inadequate assessment of punctuation prediction. In
this paper, we introduce a LibriSpeech-PC benchmark designed to assess the
punctuation and capitalization prediction capabilities of end-to-end ASR
models. The benchmark includes a LibriSpeech-PC dataset with restored
punctuation and capitalization, a novel evaluation metric called Punctuation
Error Rate (PER) that focuses on punctuation marks, and initial baseline
models. All code, data, and models are publicly available.
Related papers
- Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability.
Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences.
Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z) - Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices [8.77712061194924]
We present a finite-state transducer (FST) technique for rewriting wordpiece lattices generated by Transformer-based CTC models.
Our algorithm performs grapheme-to-phoneme (G2P) conversion directly from wordpieces into phonemes, avoiding explicit word representations.
We achieved up to a 15.2% relative reduction in sentence error rate (SER) on a test set with contextually relevant entities.
arXiv Detail & Related papers (2024-09-24T21:42:25Z) - What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations [0.0]
We investigate the text normalization routine employed by leading ASR models, including OpenAI Whisper, Meta's MMS, Seamless, and Assembly AI's Conformer.
Our research reveals that current text normalization practices, while aiming to standardize ASR outputs for fair comparison, are fundamentally flawed when applied to Indic scripts.
We propose a shift towards developing text normalization routines that leverage native linguistic expertise.
arXiv Detail & Related papers (2024-09-04T05:08:23Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Boosting Punctuation Restoration with Data Generation and Reinforcement
Learning [70.26450819702728]
Punctuation restoration is an important task in automatic speech recognition (ASR)
The discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts.
This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap.
arXiv Detail & Related papers (2023-07-24T17:22:04Z) - TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and
Punctuation model evaluation and selection [1.4720080476520687]
Punctuation and readability are key to readability in Automatic Speech Recognition.
Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability.
We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems.
arXiv Detail & Related papers (2022-10-27T01:11:32Z) - SNaC: Coherence Error Detection for Narrative Summarization [73.48220043216087]
We introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.
We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries.
Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators.
arXiv Detail & Related papers (2022-05-19T16:01:47Z) - Discriminative Self-training for Punctuation Prediction [5.398944179152948]
Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts.
achieving good performance on punctuation prediction often requires large amounts of labeled speech transcripts.
We propose a Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts.
arXiv Detail & Related papers (2021-04-21T03:32:47Z) - Robust Prediction of Punctuation and Truecasing for Medical ASR [18.08508027663331]
This paper proposes a conditional joint modeling framework for prediction of punctuation and truecasing.
We also present techniques for domain and task specific adaptation by fine-tuning masked language models with medical domain data.
arXiv Detail & Related papers (2020-07-04T07:15:13Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.