Exploring Energy-based Language Models with Different Architectures and
Training Methods for Speech Recognition
- URL: http://arxiv.org/abs/2305.12676v3
- Date: Mon, 29 May 2023 06:38:21 GMT
- Title: Exploring Energy-based Language Models with Different Architectures and
Training Methods for Speech Recognition
- Authors: Hong Liu, Zhaobiao Lv, Zhijian Ou, Wenbo Zhao, Qing Xiao
- Abstract summary: Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences.
In this paper, we explore different architectures of energy functions and different training methods to investigate the capabilities of ELMs in rescoring for speech recognition.
- Score: 23.970716487502273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Energy-based language models (ELMs) parameterize an unnormalized distribution
for natural sentences and are radically different from popular autoregressive
language models (ALMs). As an important application, ELMs have been
successfully used as a means for calculating sentence scores in speech
recognition, but they all use less-modern CNN or LSTM networks. The recent
progress in Transformer networks and large pretrained models such as BERT and
GPT2 opens new possibility to further advancing ELMs. In this paper, we explore
different architectures of energy functions and different training methods to
investigate the capabilities of ELMs in rescoring for speech recognition, all
using large pretrained models as backbones.
Related papers
- MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting [53.77590764277568]
We introduce a novel MoE-CT architecture that separates the base model's learning from the multilingual expansion process.
Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency.
arXiv Detail & Related papers (2024-06-25T11:03:45Z) - Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition [12.77573161345651]
This paper proposes integrating a pre-trained speech representation model and a large language model (LLM) for E2E ASR.
The proposed model enables the optimization of the entire ASR process, including acoustic feature extraction and acoustic and language modeling.
arXiv Detail & Related papers (2023-12-06T18:34:42Z) - Pre-trained Language Models for Keyphrase Generation: A Thorough
Empirical Study [76.52997424694767]
We present an in-depth empirical study of keyphrase extraction and keyphrase generation using pre-trained language models.
We show that PLMs have competitive high-resource performance and state-of-the-art low-resource performance.
Further results show that in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models.
arXiv Detail & Related papers (2022-12-20T13:20:21Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - TEASEL: A Transformer-Based Speech-Prefixed Language Model [4.014524824655106]
Multimodal language analysis aims to simultaneously model a speaker's words, acoustical annotations, and facial expressions.
lexicon features usually outperform other modalities because they are pre-trained on large corpora via Transformer-based models.
Despite their strong performance, training a new self-supervised learning (SSL) Transformer on any modality is not usually attainable due to insufficient data.
arXiv Detail & Related papers (2021-09-12T14:08:57Z) - Joint Energy-based Model Training for Better Calibrated Natural Language
Understanding Models [61.768082640087]
We explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders for natural language understanding tasks.
Experiments show that EBM training can help the model reach a better calibration that is competitive to strong baselines.
arXiv Detail & Related papers (2021-01-18T01:41:31Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z) - A Convolutional Deep Markov Model for Unsupervised Speech Representation
Learning [32.59760685342343]
Probabilistic Latent Variable Models provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.
In this work, we propose ConvDMM, a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks.
When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods.
arXiv Detail & Related papers (2020-06-03T21:50:20Z) - EEG based Continuous Speech Recognition using Transformers [13.565270550358397]
We investigate continuous speech recognition using electroencephalography (EEG) features using end-to-end transformer based automatic speech recognition (ASR) model.
Our results demonstrate that transformer based model demonstrate faster training compared to recurrent neural network (RNN) based sequence-to-sequence EEG models.
arXiv Detail & Related papers (2019-12-31T08:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.