Discriminative Self-training for Punctuation Prediction
- URL: http://arxiv.org/abs/2104.10339v1
- Date: Wed, 21 Apr 2021 03:32:47 GMT
- Title: Discriminative Self-training for Punctuation Prediction
- Authors: Qian Chen, Wen Wang, Mengzhe Chen, Qinglin Zhang
- Abstract summary: Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts.
achieving good performance on punctuation prediction often requires large amounts of labeled speech transcripts.
We propose a Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts.
- Score: 5.398944179152948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Punctuation prediction for automatic speech recognition (ASR) output
transcripts plays a crucial role for improving the readability of the ASR
transcripts and for improving the performance of downstream natural language
processing applications. However, achieving good performance on punctuation
prediction often requires large amounts of labeled speech transcripts, which is
expensive and laborious. In this paper, we propose a Discriminative
Self-Training approach with weighted loss and discriminative label smoothing to
exploit unlabeled speech transcripts. Experimental results on the English
IWSLT2011 benchmark test set and an internal Chinese spoken language dataset
demonstrate that the proposed approach achieves significant improvement on
punctuation prediction accuracy over strong baselines including BERT, RoBERTa,
and ELECTRA models. The proposed Discriminative Self-Training approach
outperforms the vanilla self-training approach. We establish a new
state-of-the-art (SOTA) on the IWSLT2011 test set, outperforming the current
SOTA model by 1.3% absolute gain on F$_1$.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution [5.1660803395535835]
Self-supervised learning (SSL) has shown stellar performance compared to traditional methods.
However, SSL-based ASA systems are faced with at least three data-related challenges.
These challenges include limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels.
arXiv Detail & Related papers (2024-04-11T09:06:49Z) - Self-supervised Adaptive Pre-training of Multilingual Speech Models for
Language and Dialect Identification [19.893213508284813]
Self-supervised adaptive pre-training is proposed to adapt the pre-trained model to the target domain and languages of the downstream task.
We show that SAPT improves XLSR performance on the FLEURS benchmark with substantial gains up to 40.1% for under-represented languages.
arXiv Detail & Related papers (2023-12-12T14:58:08Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Simple and Effective Unsupervised Speech Translation [68.25022245914363]
We study a simple and effective approach to build speech translation systems without labeled data.
We present an unsupervised domain adaptation technique for pre-trained speech models.
Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art.
arXiv Detail & Related papers (2022-10-18T22:26:13Z) - Supervision-Guided Codebooks for Masked Prediction in Speech
Pre-training [102.14558233502514]
Masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.
We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2022-06-21T06:08:30Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text
Joint Pre-Training [33.02912456062474]
We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech.
We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST2 speech translation.
arXiv Detail & Related papers (2021-10-20T00:59:36Z) - Robust Prediction of Punctuation and Truecasing for Medical ASR [18.08508027663331]
This paper proposes a conditional joint modeling framework for prediction of punctuation and truecasing.
We also present techniques for domain and task specific adaptation by fine-tuning masked language models with medical domain data.
arXiv Detail & Related papers (2020-07-04T07:15:13Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.