Improving Punctuation Restoration for Speech Transcripts via External
Data
- URL: http://arxiv.org/abs/2110.00560v1
- Date: Fri, 1 Oct 2021 17:40:55 GMT
- Title: Improving Punctuation Restoration for Speech Transcripts via External
Data
- Authors: Xue-Yong Fu, Cheng Chen, Md Tahmid Rahman Laskar, Shashi Bhushan TN,
Simon Corston-Oliver
- Abstract summary: We tackle the punctuation restoration problem specifically for the noisy text.
We introduce a data sampling technique based on an n-gram language model to sample more training data.
The proposed approach outperforms the baseline with an improvement of 1:12% F1 score.
- Score: 1.4335946386597276
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic Speech Recognition (ASR) systems generally do not produce
punctuated transcripts. To make transcripts more readable and follow the
expected input format for downstream language models, it is necessary to add
punctuation marks. In this paper, we tackle the punctuation restoration problem
specifically for the noisy text (e.g., phone conversation scenarios). To
leverage the available written text datasets, we introduce a data sampling
technique based on an n-gram language model to sample more training data that
are similar to our in-domain data. Moreover, we propose a two-stage fine-tuning
approach that utilizes the sampled external data as well as our in-domain
dataset for models based on BERT. Extensive experiments show that the proposed
approach outperforms the baseline with an improvement of 1:12% F1 score.
Related papers
- Spontaneous Informal Speech Dataset for Punctuation Restoration [0.8517406772939293]
We introduce SponSpeech, a punctuation restoration dataset derived from informal speech sources.
Our filtering pipeline examines the quality of both speech audio and transcription text.
We also carefully construct a challenging" test set, aimed at evaluating models' ability to leverage audio information to predict otherwise grammatically ambiguous punctuation.
arXiv Detail & Related papers (2024-09-17T14:43:14Z) - Handling Numeric Expressions in Automatic Speech Recognition [56.972851337263755]
We compare cascaded and end-to-end approaches to recognize and format numeric expression.
Results show that adapted end-to-end models offer competitive performance with the advantage of lower latency and inference cost.
arXiv Detail & Related papers (2024-07-18T09:46:19Z) - Deepfake audio as a data augmentation technique for training automatic
speech to text transcription models [55.2480439325792]
We propose a framework that approaches data augmentation based on deepfake audio.
A dataset produced by Indians (in English) was selected, ensuring the presence of a single accent.
arXiv Detail & Related papers (2023-09-22T11:33:03Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR [10.261890123213622]
We propose an on-the-fly data augmentation method for automatic speech recognition (ASR)
Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate training pairs.
arXiv Detail & Related papers (2021-04-03T13:00:00Z) - Neural Data-to-Text Generation with LM-based Text Augmentation [27.822282190362856]
We show that a weakly supervised training paradigm is able to outperform fully supervised seq2seq models with less than 10% annotations.
By utilizing all annotated data, our model can boost the performance of a standard seq2seq model by over 5 BLEU points.
arXiv Detail & Related papers (2021-02-06T10:21:48Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Leverage Unlabeled Data for Abstractive Speech Summarization with
Self-Supervised Learning and Back-Summarization [6.465251961564605]
Supervised approaches for Neural Abstractive Summarization require large annotated corpora that are costly to build.
We present a French meeting summarization task where reports are predicted based on the automatic transcription of the meeting audio recordings.
We report large improvements compared to the previous baseline for both approaches on two evaluation sets.
arXiv Detail & Related papers (2020-07-30T08:22:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.