A context-aware knowledge transferring strategy for CTC-based ASR
- URL: http://arxiv.org/abs/2210.06244v1
- Date: Wed, 12 Oct 2022 14:31:38 GMT
- Title: A context-aware knowledge transferring strategy for CTC-based ASR
- Authors: Ke-Han Lu, Kuan-Yu Chen
- Abstract summary: Methods based on the connectionist temporal classification (CTC) are still a dominating stream.
We propose a context-aware knowledge transferring strategy, consisting of a knowledge transferring module and a context-aware training strategy, for CTC-based ASR.
A knowledge-injected context-aware CTC-based ASR built upon the wav2vec2.0 is presented in this paper.
- Score: 9.500518278458905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive automatic speech recognition (ASR) modeling has received
increasing attention recently because of its fast decoding speed and superior
performance. Among representatives, methods based on the connectionist temporal
classification (CTC) are still a dominating stream. However, the theoretically
inherent flaw, the assumption of independence between tokens, creates a
performance barrier for the school of works. To mitigate the challenge, we
propose a context-aware knowledge transferring strategy, consisting of a
knowledge transferring module and a context-aware training strategy, for
CTC-based ASR. The former is designed to distill linguistic information from a
pre-trained language model, and the latter is framed to modulate the
limitations caused by the conditional independence assumption. As a result, a
knowledge-injected context-aware CTC-based ASR built upon the wav2vec2.0 is
presented in this paper. A series of experiments on the AISHELL-1 and AISHELL-2
datasets demonstrate the effectiveness of the proposed method.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - Causal Semantic Communication for Digital Twins: A Generalizable
Imitation Learning Approach [74.25870052841226]
A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing, and artificial intelligence (AI) technologies to enable many connected intelligence services.
Wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints.
A novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems.
arXiv Detail & Related papers (2023-04-25T00:15:00Z) - Contextual-Utterance Training for Automatic Speech Recognition [65.4571135368178]
We propose a contextual-utterance training technique which makes use of the previous and future contextual utterances.
Also, we propose a dual-mode contextual-utterance training technique for streaming automatic speech recognition (ASR) systems.
The proposed technique is able to reduce both the WER and the average last token emission latency by more than 6% and 40ms relative.
arXiv Detail & Related papers (2022-10-27T08:10:44Z) - Distilling the Knowledge of BERT for CTC-based ASR [38.345330002791606]
We propose to distill the knowledge of BERT for CTC-based ASR.
CTC-based ASR learns the knowledge of BERT during training and does not use BERT during testing.
We show that our method improves the performance of CTC-based ASR without the cost of inference speed.
arXiv Detail & Related papers (2022-09-05T16:08:35Z) - The THUEE System Description for the IARPA OpenASR21 Challenge [12.458730613670316]
This paper describes the THUEE team's speech recognition system for the IARPA Open Automatic Speech Recognition Challenge (OpenASR21)
We achieve outstanding results under both the Constrained and Constrained-plus training conditions.
We find that the feature extractor plays an important role when applying the wav2vec2.0 pre-trained model to the encoder-decoder based CTC/Attention ASR architecture.
arXiv Detail & Related papers (2022-06-29T14:03:05Z) - Model-based Deep Learning Receiver Design for Rate-Splitting Multiple
Access [65.21117658030235]
This work proposes a novel design for a practical RSMA receiver based on model-based deep learning (MBDL) methods.
The MBDL receiver is evaluated in terms of uncoded Symbol Error Rate (SER), throughput performance through Link-Level Simulations (LLS) and average training overhead.
Results reveal that the MBDL outperforms by a significant margin the SIC receiver with imperfect CSIR.
arXiv Detail & Related papers (2022-05-02T12:23:55Z) - Improving CTC-based speech recognition via knowledge transferring from
pre-trained language models [30.599901925058873]
We propose two knowledge transferring methods to improve CTC-based models.
The first method is based on representation learning, in which the CTC-based models use the representation produced by BERT as an auxiliary learning target.
The second method is based on joint classification learning, which combines GPT2 for text modeling with a hybrid CTC/attention architecture.
arXiv Detail & Related papers (2022-02-22T11:30:55Z) - Relaxing the Conditional Independence Assumption of CTC-based ASR by
Conditioning on Intermediate Predictions [14.376418789524783]
We train a CTC-based ASR model with auxiliary CTC losses in intermediate layers in addition to the original CTC loss in the last layer.
Our method is easy to implement and retains the merits of CTC-based ASR: a simple model architecture and fast decoding speed.
arXiv Detail & Related papers (2021-04-06T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.