Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition
- URL: http://arxiv.org/abs/2108.07789v1
- Date: Thu, 29 Jul 2021 16:53:37 GMT
- Title: Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition
- Authors: Xianrui Zheng, Chao Zhang and Philip C. Woodland
- Abstract summary: We present results using fine-tuned GPT, GPT-2 and their combination for automatic speech recognition (ASR)
A conversion method is proposed to compute the correct language prior probability based on bidirectional LM outputs.
The proposed conversion for language prior probabilities enables BERT to receive an extra 3% relative WERR.
- Score: 14.82259273703819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) pre-trained on massive amounts of text, in particular
bidirectional encoder representations from Transformers (BERT), generative
pre-training (GPT), and GPT-2, have become a key technology for many natural
language processing tasks. In this paper, we present results using fine-tuned
GPT, GPT-2, and their combination for automatic speech recognition (ASR).
Unlike unidirectional LM GPT and GPT-2, BERT is bidirectional whose direct
product of the output probabilities is no longer a valid language prior
probability. A conversion method is proposed to compute the correct language
prior probability based on bidirectional LM outputs in a mathematically exact
way. Experimental results on the widely used AMI and Switchboard ASR tasks
showed that the combination of the fine-tuned GPT and GPT-2 outperformed the
combination of three neural LMs with different architectures trained from
scratch on the in-domain text by up to a 12% relative word error rate reduction
(WERR). Furthermore, the proposed conversion for language prior probabilities
enables BERT to receive an extra 3% relative WERR, and the combination of BERT,
GPT and GPT-2 results in further improvements.
Related papers
- Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models [74.71484979138161]
Grapheme-to-phoneme (G2P) conversion is a crucial step in Text-to-Speech (TTS) systems.
Inspired by the success of Large Language Models (LLMs) in handling context-aware scenarios, contextual G2P conversion systems are proposed.
The efficacy of incorporating ICKR into G2P conversion systems is demonstrated thoroughly on the Librig2p dataset.
arXiv Detail & Related papers (2024-11-12T05:38:43Z) - BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding [24.54436986074267]
We introduce BELT-2, a pioneering multi-task model designed to enhance both encoding and decoding performance from EEG signals.
BELT-2 is the first work to innovatively 1) adopt byte-pair encoding (BPE)-level EEG-language alignment and 2) integrate multi-task training and decoding in the EEG domain.
These innovative efforts make BELT-2 a pioneering breakthrough, making it the first work in the field capable of decoding coherent and readable sentences from non-invasive brain signals.
arXiv Detail & Related papers (2024-08-28T12:30:22Z) - Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder [69.7813498468116]
We propose Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text.
We also develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations) to decode text from EEG sequences.
arXiv Detail & Related papers (2024-02-27T11:45:21Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - Translation-Enhanced Multilingual Text-to-Image Generation [61.41730893884428]
Research on text-to-image generation (TTI) still predominantly focuses on the English language.
In this work, we thus investigate multilingual TTI and the current potential of neural machine translation (NMT) to bootstrap mTTI systems.
We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework.
arXiv Detail & Related papers (2023-05-30T17:03:52Z) - Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs
without Fine-tuning [96.13057811149827]
We propose Inference-time Policy Adapters (IPA), which efficiently tailors a language model without fine-tuning it.
IPA guides a large base model during decoding time through a lightweight policy adapter trained to optimize an arbitrary user objective.
It consistently brings significant improvements over off-the-shelf language models.
arXiv Detail & Related papers (2023-05-24T11:52:55Z) - AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for
Language Modeling [33.18577107062907]
Variational Auto-Encoder (VAE) has become the de-facto learning paradigm in achieving both representation learning and generation for natural language.
Existing VAE-based language models either employ elementary RNNs, or fine-tunes two pre-trained language models (PLMs) for any downstream task, which requires huge energy consumption.
In this paper, we introduce the first VAE framework empowered with adaptive GPT-2s (AdaVAE)
arXiv Detail & Related papers (2022-05-12T03:22:07Z) - Improving Mandarin End-to-End Speech Recognition with Word N-gram
Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems.
We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences.
Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z) - Variational Latent-State GPT for Semi-supervised Task-Oriented Dialog
Systems [24.667353107453824]
Variational Latent-State GPT model (VLS-GPT) is the first to combine the strengths of the two approaches.
We develop the strategy of sampling-then-forward-computation, which successfully overcomes the memory explosion issue of using GPT in variational learning.
VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised baselines.
arXiv Detail & Related papers (2021-09-09T14:42:29Z) - Prior Art Search and Reranking for Generated Patent Text [1.8275108630751844]
We implement a reranking system to identify retrospectively the most similar inputs to a GPT model based on its output.
To our knowledge, this work is the first to implement a reranking system to identify retrospectively the most similar inputs to a GPT model based on its output.
arXiv Detail & Related papers (2020-09-19T01:16:18Z) - Assessing Discourse Relations in Language Generation from GPT-2 [37.30382375828105]
GPT-2 is suited for generation tasks given its left-to-right language modeling objective.
We study the validity of explicit discourse relations in GPT-2's outputs under both organic generation and fine-tuned scenarios.
arXiv Detail & Related papers (2020-04-26T23:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.