Back from the future: bidirectional CTC decoding using future
information in speech recognition
- URL: http://arxiv.org/abs/2110.03326v1
- Date: Thu, 7 Oct 2021 10:42:02 GMT
- Title: Back from the future: bidirectional CTC decoding using future
information in speech recognition
- Authors: Namkyu Jung, Geonmin Kim, Han-Gyu Kim
- Abstract summary: We propose a simple but effective method to decode the output of the Connectionist Temporal Temporal (CTC) model using a bi-directional neural language model.
The proposed method based on bi-directional beam search takes advantage of the CTC greedy decoding output to represent the noisy future information.
- Score: 3.386091225912298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a simple but effective method to decode the output
of Connectionist Temporal Classifier (CTC) model using a bi-directional neural
language model. The bidirectional language model uses the future as well as the
past information in order to predict the next output in the sequence. The
proposed method based on bi-directional beam search takes advantage of the CTC
greedy decoding output to represent the noisy future information. Experiments
on the Librispeechdataset demonstrate the superiority of our proposed method
compared to baselines using unidirectional decoding. In particular, the boost
inaccuracy is most apparent at the start of a sequence which is the most
erroneous part for existing systems based on unidirectional decoding.
Related papers
- Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units [64.61596752343837]
We present a novel two-pass direct S2ST architecture, UnitY, which first generates textual representations and predicts discrete acoustic units.
We enhance the model performance by subword prediction in the first-pass decoder.
We show that the proposed methods boost the performance even when predicting spectrogram in the second pass.
arXiv Detail & Related papers (2022-12-15T18:58:28Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
Speech Data [145.95460945321253]
We introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes.
The proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training.
arXiv Detail & Related papers (2022-03-31T15:33:56Z) - Look Backward and Forward: Self-Knowledge Distillation with
Bidirectional Decoder for Neural Machine Translation [9.279287354043289]
Self-Knowledge Distillation with Bidirectional Decoder for Neural Machine Translation(SBD-NMT)
We deploy a backward decoder which can act as an effective regularization method to the forward decoder.
Experiments show that our method is significantly better than the strong Transformer baselines on multiple machine translation data sets.
arXiv Detail & Related papers (2022-03-10T09:21:28Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.