Related papers: Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

URL: http://arxiv.org/abs/2110.04056v1
Date: Fri, 8 Oct 2021 12:05:25 GMT
Title: Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Authors: Shaoshi Ling, Chen Shen, Meng Cai, Zejun Ma
Abstract summary: We propose a novel approach to combine their ideas for end-to-end speech recognition model. Without any extra loss function, we utilize the Gradient Mask to optimize the model when training on pseudo-label. In our semi-supervised experiments, the method can improve the model performance when training on pseudo-label.
Score: 7.807021847783367
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the recent trend of semi-supervised speech recognition, both self-supervised representation learning and pseudo-labeling have shown promising results. In this paper, we propose a novel approach to combine their ideas for end-to-end speech recognition model. Without any extra loss function, we utilize the Gradient Mask to optimize the model when training on pseudo-label. This method forces the speech recognition model to predict from the masked input to learn strong acoustic representation and make training robust to label noise. In our semi-supervised experiments, the method can improve the model performance when training on pseudo-label and our method achieved competitive results comparing with other semi-supervised approaches on the Librispeech 100 hours experiments.

Related papers

Pre-trained Vision-Language Models Assisted Noisy Partial Label Learning [40.89255396643592]
This paper focuses on learning from noisy partial labels annotated by pre-trained vision-language models (VLMs)<n>We propose an innovative collaborative consistency regularization (Co-Reg) method to address the noise generated by pre-trained models.<n>Our method can leverage few-shot manually annotated valid labels to further enhance its performances.
arXiv Detail & Related papers (2025-06-03T12:48:54Z)
Adversarial Representation Learning for Robust Privacy Preservation in Audio [11.409577482625053]
Sound event detection systems may inadvertently reveal sensitive information about users or their surroundings. We propose a novel adversarial training method for learning representations of audio recordings. The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method.
arXiv Detail & Related papers (2023-04-29T08:39:55Z)
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training [102.14558233502514]
Masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition. We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2022-06-21T06:08:30Z)
Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models [95.97506031821217]
We present a novel way of conditioning a pretrained denoising diffusion speech model to produce speech in the voice of a novel person unseen during training. The method requires a short (3 seconds) sample from the target person, and generation is steered at inference time, without any training steps.
arXiv Detail & Related papers (2022-06-05T19:45:29Z)
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training [33.79711018198589]
We extend the existing mixture invariant training criterion to exploit both unpaired clean speech and real noisy data. It is found that the unpaired clean speech is crucial to improve quality of separated speech from real noisy speech. The proposed method also performs remixing of processed and unprocessed signals to alleviate the processing artifacts.
arXiv Detail & Related papers (2022-05-03T19:37:58Z)
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task. This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z)
Curriculum optimization for low-resource speech recognition [4.803994937990389]
We propose an automated curriculum learning approach to optimize the sequence of training examples. We introduce a new difficulty measure called compression ratio that can be used as a scoring function for raw audio in various noise conditions.
arXiv Detail & Related papers (2022-02-17T19:47:50Z)
Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space. We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z)
Self-supervised Learning with Random-projection Quantizer for Speech Recognition [51.24368930992091]
We present a simple and effective self-supervised learning approach for speech recognition. The approach learns a model to predict masked speech signals, in the form of discrete labels. It achieves similar word-error-rates as previous work using self-supervised learning with non-streaming models.
arXiv Detail & Related papers (2022-02-03T21:29:04Z)
Streaming end-to-end speech recognition with jointly trained neural feature enhancement [20.86554979122057]
We present a streaming end-to-end speech recognition model based on Monotonic Chunkwise Attention (MoCha) jointly trained with enhancement layers. We introduce two training strategies: Gradual Application of Enhanced Features (GAEF) and Gradual Reduction of Enhanced Loss (GREL)
arXiv Detail & Related papers (2021-05-04T02:25:41Z)
Multi-Objective Interpolation Training for Robustness to Label Noise [17.264550056296915]
We show that standard supervised contrastive learning degrades in the presence of label noise. We propose a novel label noise detection method that exploits the robust feature representations learned via contrastive learning. Experiments on synthetic and real-world noise benchmarks demonstrate that MOIT/MOIT+ achieves state-of-the-art results.
arXiv Detail & Related papers (2020-12-08T15:01:54Z)
Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation. We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining. Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.