Boosting Punctuation Restoration with Data Generation and Reinforcement
Learning
- URL: http://arxiv.org/abs/2307.12949v1
- Date: Mon, 24 Jul 2023 17:22:04 GMT
- Title: Boosting Punctuation Restoration with Data Generation and Reinforcement
Learning
- Authors: Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun
Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen
- Abstract summary: Punctuation restoration is an important task in automatic speech recognition (ASR)
The discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts.
This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap.
- Score: 70.26450819702728
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Punctuation restoration is an important task in automatic speech recognition
(ASR) which aim to restore the syntactic structure of generated ASR texts to
improve readability. While punctuated texts are abundant from written
documents, the discrepancy between written punctuated texts and ASR texts
limits the usability of written texts in training punctuation restoration
systems for ASR texts. This paper proposes a reinforcement learning method to
exploit in-topic written texts and recent advances in large pre-trained
generative language models to bridge this gap. The experiments show that our
method achieves state-of-the-art performance on the ASR test set on two
benchmark datasets for punctuation restoration.
Related papers
- QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval [12.225881591629815]
In dense retrieval, embedding long texts into dense vectors can result in information loss, leading to inaccurate query-text matching.
Recent studies mainly focus on improving the sentence embedding model or retrieval process.
We introduce a novel text augmentation framework for dense retrieval, which transforms raw documents into information-dense text formats.
arXiv Detail & Related papers (2024-07-29T17:39:08Z) - Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - An efficient text augmentation approach for contextualized Mandarin speech recognition [4.600045052545344]
Our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models.
To contextualize a pre-trained CIF-based ASR, we construct a codebook using limited speech-text data.
Our experiments on diverse Mandarin test sets demonstrate that our TA approach significantly boosts recognition performance.
arXiv Detail & Related papers (2024-06-14T11:53:14Z) - LibriSpeech-PC: Benchmark for Evaluation of Punctuation and
Capitalization Capabilities of end-to-end ASR Models [58.790604613878216]
We introduce a LibriSpeech-PC benchmark designed to assess the punctuation and capitalization prediction capabilities of end-to-end ASR models.
The benchmark includes a LibriSpeech-PC dataset with restored punctuation and capitalization, a novel evaluation metric called Punctuation Error Rate (PER) that focuses on punctuation marks, and initial baseline models.
arXiv Detail & Related papers (2023-10-04T16:23:37Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Text Generation with Speech Synthesis for ASR Data Augmentation [17.348764629839636]
We explore text augmentation for Automatic Speech Recognition (ASR) using large-scale pre-trained neural networks.
We find that neural models achieve 9%-15% relative WER improvement and outperform traditional methods.
arXiv Detail & Related papers (2023-05-22T18:45:20Z) - Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T)
We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces.
Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z) - A Benchmark Corpus for the Detection of Automatically Generated Text in
Academic Publications [0.02578242050187029]
This paper presents two datasets comprised of artificially generated research content.
In the first case, the content is completely generated by the GPT-2 model after a short prompt extracted from original papers.
The partial or hybrid dataset is created by replacing several sentences of abstracts with sentences that are generated by the Arxiv-NLP model.
We evaluate the quality of the datasets comparing the generated texts to aligned original texts using fluency metrics such as BLEU and ROUGE.
arXiv Detail & Related papers (2022-02-04T08:16:56Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - SCATTER: Selective Context Attentional Scene Text Recognizer [16.311256552979835]
Scene Text Recognition (STR) is the task of recognizing text against complex image backgrounds.
Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes.
We introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER)
arXiv Detail & Related papers (2020-03-25T09:20:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.