Related papers: Semi-Supervised Speech Recognition via Local Prior Matching

Semi-Supervised Speech Recognition via Local Prior Matching

URL: http://arxiv.org/abs/2002.10336v1
Date: Mon, 24 Feb 2020 16:07:11 GMT
Title: Semi-Supervised Speech Recognition via Local Prior Matching
Authors: Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Hannun
Abstract summary: Local prior matching is a semi-supervised objective that distills knowledge from a strong prior. We demonstrate that LPM is theoretically well-native, simple to implement, and superior to existing knowledge distillation techniques.
Score: 42.311823406287864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discriminative model trained on unlabeled speech. We demonstrate that LPM is theoretically well-motivated, simple to implement, and superior to existing knowledge distillation techniques under comparable settings. Starting from a baseline trained on 100 hours of labeled speech, with an additional 360 hours of unlabeled data, LPM recovers 54% and 73% of the word error rate on clean and noisy test sets relative to a fully supervised model on the same data.

Related papers

SyllableLM: Learning Coarse Semantic Units for Speech Language Models [21.762112843104028]
We introduce a controllable self-supervised technique to merge speech representations into coarser syllable-like units. Our method produces controllable-rate semantic units at as low as 5Hz and 60bps and SotA inc segmentation and clustering. SyllableLM achieves significant improvements in efficiency with a 30x reduction in training compute and a 4x wall-clock inference speedup.
arXiv Detail & Related papers (2024-10-05T04:29:55Z)
Augmenting Automatic Speech Recognition Models with Disfluency Detection [12.45703869323415]
Speech disfluency commonly occurs in conversational and spontaneous speech. Current research mainly focuses on detecting disfluencies within transcripts, overlooking their exact location and duration in the speech. We present an inference-only approach to augment any ASR model with the ability to detect open-set disfluencies.
arXiv Detail & Related papers (2024-09-16T11:13:14Z)
Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels. By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data. The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z)
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models [84.8919069953397]
Self-TAught Recognizer (STAR) is an unsupervised adaptation framework for speech recognition systems. We show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains. STAR exhibits high data efficiency that only requires less than one-hour unlabeled data.
arXiv Detail & Related papers (2024-05-23T04:27:11Z)
Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model [4.503292461488901]
We propose a Perceiver-based sequence to detect abnormalities in speech reflective of several neurological disorders. We combine this sequence with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings. Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%.
arXiv Detail & Related papers (2023-10-16T21:07:12Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Wake Word Detection with Alignment-Free Lattice-Free MMI [66.12175350462263]
Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data. We evaluate our methods on two real data sets, showing 50%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures.
arXiv Detail & Related papers (2020-05-17T19:22:25Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.