Shallow Fusion of Weighted Finite-State Transducer and Language Model
for Text Normalization
- URL: http://arxiv.org/abs/2203.15917v1
- Date: Tue, 29 Mar 2022 21:34:35 GMT
- Title: Shallow Fusion of Weighted Finite-State Transducer and Language Model
for Text Normalization
- Authors: Evelina Bakhturina, Yang Zhang, Boris Ginsburg
- Abstract summary: We propose a new hybrid approach that combines the benefits of rule-based and neural systems.
First, a non-deterministic WFST outputs all normalization candidates, and then a neural language model picks the best one.
It achieves comparable or better results than existing state-of-the-art TN models.
- Score: 13.929356163132558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text normalization (TN) systems in production are largely rule-based using
weighted finite-state transducers (WFST). However, WFST-based systems struggle
with ambiguous input when the normalized form is context-dependent. On the
other hand, neural text normalization systems can take context into account but
they suffer from unrecoverable errors and require labeled normalization
datasets, which are hard to collect. We propose a new hybrid approach that
combines the benefits of rule-based and neural systems. First, a
non-deterministic WFST outputs all normalization candidates, and then a neural
language model picks the best one -- similar to shallow fusion for automatic
speech recognition. While the WFST prevents unrecoverable errors, the language
model resolves contextual ambiguity. The approach is easy to extend and we show
it is effective. It achieves comparable or better results than existing
state-of-the-art TN models.
Related papers
- Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm [45.42075576656938]
Contextual biasing refers to the problem of biasing automatic speech recognition systems towards rare entities.
We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching.
arXiv Detail & Related papers (2023-09-29T22:50:10Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Categorizing Semantic Representations for Neural Machine Translation [53.88794787958174]
We introduce categorization to the source contextualized representations.
The main idea is to enhance generalization by reducing sparsity and overfitting.
Experiments on a dedicated MT dataset show that our method reduces compositional generalization error rates by 24% error reduction.
arXiv Detail & Related papers (2022-10-13T04:07:08Z) - Thutmose Tagger: Single-pass neural model for Inverse Text Normalization [76.87664008338317]
Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition.
We present a dataset preparation method based on the granular alignment of ITN examples.
One-to-one correspondence between tags and input words improves the interpretability of the model's predictions.
arXiv Detail & Related papers (2022-07-29T20:39:02Z) - An End-to-end Chinese Text Normalization Model based on Rule-guided
Flat-Lattice Transformer [37.0774363352316]
We propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input.
We also release a first publicly accessible largescale dataset for Chinese text normalization.
arXiv Detail & Related papers (2022-03-31T11:19:53Z) - Neural-FST Class Language Model for End-to-End Speech Recognition [30.670375747577694]
We propose a Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition.
We show that NFCLM significantly outperforms NNLM by 15.8% relative in terms of Word Error Rate.
arXiv Detail & Related papers (2022-01-28T00:20:57Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Neural Inverse Text Normalization [11.240669509034298]
We propose an efficient and robust neural solution for inverse text normalization.
We show that this can be easily extended to other languages without the need for a linguistic expert to manually curate them.
A transformer based model infused with pretraining consistently achieves a lower WER across several datasets.
arXiv Detail & Related papers (2021-02-12T07:53:53Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - Learning Likelihoods with Conditional Normalizing Flows [54.60456010771409]
Conditional normalizing flows (CNFs) are efficient in sampling and inference.
We present a study of CNFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x)
arXiv Detail & Related papers (2019-11-29T19:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.