Effect and Analysis of Large-scale Language Model Rescoring on
Competitive ASR Systems
- URL: http://arxiv.org/abs/2204.00212v1
- Date: Fri, 1 Apr 2022 05:20:55 GMT
- Title: Effect and Analysis of Large-scale Language Model Rescoring on
Competitive ASR Systems
- Authors: Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George
Saon
- Abstract summary: Large-scale language models (LLMs) have been successfully applied to ASR N-best rescoring.
In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model.
- Score: 30.873546090458678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been
successfully applied to ASR N-best rescoring. However, whether or how they can
benefit competitive, near state-of-the-art ASR systems remains unexplored. In
this study, we incorporate LLM rescoring into one of the most competitive ASR
baselines: the Conformer-Transducer model. We demonstrate that consistent
improvement is achieved by the LLM's bidirectionality, pretraining, in-domain
finetuning and context augmentation. Furthermore, our lexical analysis sheds
light on how each of these components may be contributing to the ASR
performance.
Related papers
- CTC-Assisted LLM-Based Contextual ASR [40.6542391788212]
We propose a CTC-Assisted LLM-Based Contextual ASR model with an efficient filtering algorithm.
Our model attains WER/B-WER of 1.27%/3.67% and 2.72%/8.02% on the Librispeech test-clean and test-other sets targeting on recognizing rare long-tail words.
arXiv Detail & Related papers (2024-11-10T11:47:50Z) - Exploiting Self-Supervised Constraints in Image Super-Resolution [72.35265021054471]
This paper introduces a novel self-supervised constraint for single image super-resolution, termed SSC-SR.
SSC-SR uniquely addresses the divergence in image complexity by employing a dual asymmetric paradigm and a target model updated via exponential moving average to enhance stability.
Empirical evaluations reveal that our SSC-SR framework delivers substantial enhancements on a variety of benchmark datasets, achieving an average increase of 0.1 dB over EDSR and 0.06 dB over SwinIR.
arXiv Detail & Related papers (2024-03-30T06:18:50Z) - End-to-End Speech Recognition: A Survey [68.35707678386949]
The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements.
All relevant aspects of E2E ASR are covered in this work, accompanied by discussions of performance and deployment opportunities.
arXiv Detail & Related papers (2023-03-03T01:46:41Z) - Enhancing and Adversarial: Improve ASR with Speaker Labels [49.73714831258699]
We propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort.
Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training.
Our best speaker-based MTL achieves 7% relative improvement on the Switchboard Hub5'00 set.
arXiv Detail & Related papers (2022-11-11T17:40:08Z) - FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised
Learning Features in Robust End-to-end Speech Recognition [34.40924909515384]
We propose to investigate effectiveness of diverse SSLR combinations using various fusion methods within end-to-end (E2E) ASR models.
We show that the proposed 'FeaRLESS learning features' perform better than systems without the proposed feature refinement loss for both the WSJ and Fearless Steps Challenge (FSC) corpora.
arXiv Detail & Related papers (2022-06-30T06:39:40Z) - Consistent Training and Decoding For End-to-end Speech Recognition Using
Lattice-free MMI [67.13999010060057]
We propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages.
Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements.
arXiv Detail & Related papers (2021-12-05T07:30:17Z) - Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative
Adversarial Networks [10.723935272906461]
Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored.
We introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective.
Our proposed approach outperforms baselines and conventional GAN-based adversarial models.
arXiv Detail & Related papers (2021-03-10T17:40:48Z) - Dual-mode ASR: Unify and Improve Streaming ASR with Full-context
Modeling [76.43479696760996]
We propose a unified framework, Dual-mode ASR, to train a single end-to-end ASR model with shared weights for both streaming and full-context speech recognition.
We show that the latency and accuracy of streaming ASR significantly benefit from weight sharing and joint training of full-context ASR.
arXiv Detail & Related papers (2020-10-12T21:12:56Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.