Memory Augmented Lookup Dictionary based Language Modeling for Automatic
Speech Recognition
- URL: http://arxiv.org/abs/2301.00066v1
- Date: Fri, 30 Dec 2022 22:26:57 GMT
- Title: Memory Augmented Lookup Dictionary based Language Modeling for Automatic
Speech Recognition
- Authors: Yukun Feng and Ming Tu and Rui Xia and Chuanzeng Huang and Yuxuan Wang
- Abstract summary: We propose a new memory augmented lookup dictionary based Transformer architecture for LM.
The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens.
Our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate.
- Score: 20.926163659469587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have shown that using an external Language Model (LM) benefits
the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens
that appear less frequently in the training set is still quite challenging. The
long-tail prediction problems have been widely studied in many applications,
but only been addressed by a few studies for ASR and LMs. In this paper, we
propose a new memory augmented lookup dictionary based Transformer architecture
for LM. The newly introduced lookup dictionary incorporates rich contextual
information in training set, which is vital to correctly predict long-tail
tokens. With intensive experiments on Chinese and English data sets, our
proposed method is proved to outperform the baseline Transformer LM by a great
margin on both word/character error rate and tail tokens error rate. This is
achieved without impact on the decoding efficiency. Overall, we demonstrate the
effectiveness of our proposed method in boosting the ASR decoding performance,
especially for long-tail tokens.
Related papers
- LBPE: Long-token-first Tokenization to Improve Large Language Models [26.3619552256488]
Long tokens, rich in semantic information, have fewer occurrences in tokenized datasets compared to short tokens.
We propose LBPE, which prioritizes long tokens during the encoding process.
Experiments across diverse language modeling tasks demonstrate that LBPE consistently outperforms the original BPE.
arXiv Detail & Related papers (2024-11-08T12:03:36Z) - MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression [5.5795785998430185]
MultiTok is a new tokenizing tool inspired by universal Lempel-Ziv-Welch data compression.
We show that MultiTok achieves a comparable performance to the BERT standard as a tokenizer.
arXiv Detail & Related papers (2024-10-28T21:24:51Z) - It's Never Too Late: Fusing Acoustic Information into Large Language
Models for Automatic Speech Recognition [70.77292069313154]
Large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF)
arXiv Detail & Related papers (2024-02-08T07:21:45Z) - Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose an self-supervised continual learning approach to recognize new words.
We use a memory-enhanced Automatic Speech Recognition model from previous work.
We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z) - Understanding the Role of Input Token Characters in Language Models: How
Does Information Loss Affect Performance? [45.53600782873268]
We study how information loss in input token characters affects the performance of pre-training language models.
Surprisingly, we find that pre-training even under extreme settings, i.e. using only one character of each token, the performance retention in standard NLU benchmarks and probing tasks is high.
For instance, a model pre-trained only on single first characters from tokens achieves performance retention of approximately $90$% and $77$% of the full-token model in SuperGLUE and GLUE tasks, respectively.
arXiv Detail & Related papers (2023-10-26T09:47:50Z) - End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation
and Lateral Inhibition [2.839471733237535]
We analyze several architectures and optimizations on the underrepresented, short-scale Romanian language dataset called Wild LRRo.
We obtain state-of-the-art results using our proposed method, namely cross-lingual domain adaptation and unlabeled videos.
We also assess the performance of adding a layer inspired by the neural inhibition mechanism.
arXiv Detail & Related papers (2023-10-07T15:36:58Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Learning Rich Representation of Keyphrases from Text [12.698835743464313]
We show how to learn task-specific language models aimed towards learning rich representation of keyphrases from text documents.
In the discriminative setting, we introduce a new pre-training objective - Keyphrase Boundary Infilling with Replacement (KBIR)
In the generative setting, we introduce a new pre-training setup for BART - KeyBART, that reproduces the keyphrases related to the input text in the CatSeq format.
arXiv Detail & Related papers (2021-12-16T01:09:51Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Learning to Ask Conversational Questions by Optimizing Levenshtein
Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions.
RISE is able to pay attention to tokens that are related to conversational characteristics.
Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.