Related papers: Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization

Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization

URL: http://arxiv.org/abs/2203.15917v1
Date: Tue, 29 Mar 2022 21:34:35 GMT
Title: Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization
Authors: Evelina Bakhturina, Yang Zhang, Boris Ginsburg
Abstract summary: We propose a new hybrid approach that combines the benefits of rule-based and neural systems. First, a non-deterministic WFST outputs all normalization candidates, and then a neural language model picks the best one. It achieves comparable or better results than existing state-of-the-art TN models.
Score: 13.929356163132558
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text normalization (TN) systems in production are largely rule-based using weighted finite-state transducers (WFST). However, WFST-based systems struggle with ambiguous input when the normalized form is context-dependent. On the other hand, neural text normalization systems can take context into account but they suffer from unrecoverable errors and require labeled normalization datasets, which are hard to collect. We propose a new hybrid approach that combines the benefits of rule-based and neural systems. First, a non-deterministic WFST outputs all normalization candidates, and then a neural language model picks the best one -- similar to shallow fusion for automatic speech recognition. While the WFST prevents unrecoverable errors, the language model resolves contextual ambiguity. The approach is easy to extend and we show it is effective. It achieves comparable or better results than existing state-of-the-art TN models.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm [45.42075576656938]
Contextual biasing refers to the problem of biasing automatic speech recognition systems towards rare entities. We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching.
arXiv Detail & Related papers (2023-09-29T22:50:10Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Categorizing Semantic Representations for Neural Machine Translation [53.88794787958174]
We introduce categorization to the source contextualized representations. The main idea is to enhance generalization by reducing sparsity and overfitting. Experiments on a dedicated MT dataset show that our method reduces compositional generalization error rates by 24% error reduction.
arXiv Detail & Related papers (2022-10-13T04:07:08Z)
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization [76.87664008338317]
Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition. We present a dataset preparation method based on the granular alignment of ITN examples. One-to-one correspondence between tags and input words improves the interpretability of the model's predictions.
arXiv Detail & Related papers (2022-07-29T20:39:02Z)
An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer [37.0774363352316]
We propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input. We also release a first publicly accessible largescale dataset for Chinese text normalization.
arXiv Detail & Related papers (2022-03-31T11:19:53Z)
Neural-FST Class Language Model for End-to-End Speech Recognition [30.670375747577694]
We propose a Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition. We show that NFCLM significantly outperforms NNLM by 15.8% relative in terms of Word Error Rate.
arXiv Detail & Related papers (2022-01-28T00:20:57Z)
Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction. It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition. We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z)
Neural Inverse Text Normalization [11.240669509034298]
We propose an efficient and robust neural solution for inverse text normalization. We show that this can be easily extended to other languages without the need for a linguistic expert to manually curate them. A transformer based model infused with pretraining consistently achieves a lower WER across several datasets.
arXiv Detail & Related papers (2021-02-12T07:53:53Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
Learning Likelihoods with Conditional Normalizing Flows [54.60456010771409]
Conditional normalizing flows (CNFs) are efficient in sampling and inference. We present a study of CNFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x)
arXiv Detail & Related papers (2019-11-29T19:17:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.