Related papers: BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

URL: http://arxiv.org/abs/2211.00792v1
Date: Wed, 2 Nov 2022 00:10:43 GMT
Title: BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
Authors: Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
Abstract summary: We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model. BECTRA is a transducer-based model, which adopts BERT-CTC for its encoder and trains an ASR-specific decoder using a vocabulary suitable for a target task.
Score: 43.39035144463951
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. Integrating a large-scale pre-trained language model (LM) into E2E-ASR has been actively studied, aiming to utilize versatile linguistic knowledge for generating accurate text. One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain. To overcome such an issue, we propose BECTRA, an extended version of our previous BERT-CTC, that realizes BERT-based E2E-ASR using a vocabulary of interest. BECTRA is a transducer-based model, which adopts BERT-CTC for its encoder and trains an ASR-specific decoder using a vocabulary suitable for a target task. With the combination of the transducer and BERT-CTC, we also propose a novel inference algorithm for taking advantage of both autoregressive and non-autoregressive decoding. Experimental results on several ASR tasks, varying in amounts of data, speaking styles, and languages, demonstrate that BECTRA outperforms BERT-CTC by effectively dealing with the vocabulary mismatch while exploiting BERT knowledge.

Related papers

Large Generative Model-assisted Talking-face Semantic Communication System [55.42631520122753]
This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) system. Generative Semantic Extractor (GSE) at the transmitter converts semantically sparse talking-face videos into texts with high information density. Private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction. Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video.
arXiv Detail & Related papers (2024-11-06T12:45:46Z)
BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding [24.54436986074267]
We introduce BELT-2, a pioneering multi-task model designed to enhance both encoding and decoding performance from EEG signals. BELT-2 is the first work to innovatively 1) adopt byte-pair encoding (BPE)-level EEG-language alignment and 2) integrate multi-task training and decoding in the EEG domain. These innovative efforts make BELT-2 a pioneering breakthrough, making it the first work in the field capable of decoding coherent and readable sentences from non-invasive brain signals.
arXiv Detail & Related papers (2024-08-28T12:30:22Z)
Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning. In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling. The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z)
Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder [69.7813498468116]
We propose Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text. We also develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations) to decode text from EEG sequences.
arXiv Detail & Related papers (2024-02-27T11:45:21Z)
Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges [4.588192657854766]
This survey focuses on approaches that apply pretrained transformer encoders like BERT to information retrieval (IR) We group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion. We find that for specific tasks, finely tuned BERT encoders still outperform, and at a lower deployment cost.
arXiv Detail & Related papers (2024-02-18T23:22:40Z)
Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding [29.80299587861207]
We propose an Acoustic and Semantic Cooperative Decoder (ASCD) for ASR. Unlike vanilla decoders that process acoustic and semantic features in two separate stages, ASCD integrates them cooperatively. We show that ASCD significantly improves the performance by leveraging both the acoustic and semantic information cooperatively.
arXiv Detail & Related papers (2023-05-23T13:25:44Z)
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks [28.440232737011453]
We propose a solution by combining Transducer and Attention based AED-Decoder (TAED) for speech-to-text tasks. The new method leverages Transducer's strength in non-monotonic sequence to sequence learning while retaining Transducer's streaming property. We evaluate the proposed approach on the textscMuST-C dataset and the findings demonstrate that TAED performs significantly better than Transducer for offline automatic speech recognition (ASR) and speech-to-text translation (ST) tasks.
arXiv Detail & Related papers (2023-05-04T18:34:50Z)
BERT-LID: Leveraging BERT to Improve Spoken Language Identification [12.179375898668614]
Language identification is a task of automatically determining the identity of a language conveyed by a spoken segment. Despite language identification attaining high accuracy on medium or long utterances, the performance on short utterances is still far from satisfactory. We propose an effective BERT-based language identification system (BERT-LID) to improve language identification performance.
arXiv Detail & Related papers (2022-03-01T10:01:25Z)
Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
Speech summarization can be achieved by combining automatic speech recognition (ASR) and text summarization (TS) ASR errors directly affect the quality of the output summary in the cascade approach. We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.
arXiv Detail & Related papers (2021-11-16T03:00:29Z)
Context-Aware Transformer Transducer for Speech Recognition [21.916660252023707]
We present a novel context-aware transformer transducer (CATT) network that improves the state-of-the-art transformer-based ASR system by taking advantage of such contextual signals. We show that CATT, using a BERT based context encoder, improves the word error rate of the baseline transformer transducer and outperforms an existing deep contextual model by 24.2% and 19.4% respectively.
arXiv Detail & Related papers (2021-11-05T04:14:35Z)
Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.