Related papers: A Unified Transformer-based Framework for Duplex Text Normalization

A Unified Transformer-based Framework for Duplex Text Normalization

URL: http://arxiv.org/abs/2108.09889v1
Date: Mon, 23 Aug 2021 01:55:03 GMT
Title: A Unified Transformer-based Framework for Duplex Text Normalization
Authors: Tuan Manh Lai, Yang Zhang, Evelina Bakhturina, Boris Ginsburg, Heng Ji
Abstract summary: Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition. We propose a unified framework for building a single neural duplex system that can simultaneously handle TN and ITN. Our systems achieve state-of-the-art results on the Google TN dataset for English and Russian.
Score: 33.90810154067128
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or ITN, ranging from weighted finite-state transducers to neural networks. Despite their impressive performance, these methods aim to tackle only one of the two tasks but not both. As a result, in a complete spoken dialog system, two separate models for TN and ITN need to be built. This heterogeneity increases the technical complexity of the system, which in turn increases the cost of maintenance in a production setting. Motivated by this observation, we propose a unified framework for building a single neural duplex system that can simultaneously handle TN and ITN. Combined with a simple but effective data augmentation method, our systems achieve state-of-the-art results on the Google TN dataset for English and Russian. They can also reach over 95% sentence-level accuracy on an internal English TN dataset without any additional fine-tuning. In addition, we also create a cleaned dataset from the Spoken Wikipedia Corpora for German and report the performance of our systems on the dataset. Overall, experimental results demonstrate the proposed duplex text normalization framework is highly effective and applicable to a range of domains and languages

Related papers

Evaluation of NMT-Assisted Grammar Transfer for a Multi-Language Configurable Data-to-Text System [0.04947896909360667]
One approach for multilingual data-to-text generation is to translate grammatical configurations upfront from the source language into each target language. In this paper, we describe a rule-based NLG implementation where the configuration is translated by Neural Machine Translation (NMT) combined with a one-time human review. Our evaluation on the SportSett:Basketball dataset shows that our NLG system performs well, underlining its grammatical correctness in translation tasks.
arXiv Detail & Related papers (2025-01-27T15:25:26Z)
Text-To-Speech Synthesis In The Wild [76.71096751337888]
Text-to-speech (TTS) systems are traditionally trained using modest databases of studio-quality, prompted or read speech collected in benign acoustic environments such as anechoic rooms. We introduce the TTS In the Wild (TITW) dataset, the result of a fully automated pipeline, applied to the VoxCeleb1 dataset commonly used for speaker recognition. We show that a number of recent TTS models can be trained successfully using TITW-Easy, but that it remains extremely challenging to produce similar results using TITW-Hard.
arXiv Detail & Related papers (2024-09-13T10:58:55Z)
Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models. We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks. OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z)
Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T) We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces. Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z)
Language Agnostic Data-Driven Inverse Text Normalization [6.43601166279978]
inverse text normalization (ITN) problem attracts the attention of researchers from various fields. Due to the scarcity of labeled spoken-written datasets, the studies on non-English data-driven ITN are quite limited. We propose a language-agnostic data-driven ITN framework to fill this gap.
arXiv Detail & Related papers (2023-01-20T10:33:03Z)
Improving Data Driven Inverse Text Normalization using Data Augmentation [14.820077884045645]
Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. We present a data augmentation technique that effectively generates rich spoken-written numeric pairs from out-of-domain textual data. We empirically demonstrate that ITN model trained using our data augmentation technique consistently outperform ITN model trained using only in-domain data.
arXiv Detail & Related papers (2022-07-20T06:07:26Z)
Non-Parametric Domain Adaptation for End-to-End Speech Translation [72.37869362559212]
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters. We propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system.
arXiv Detail & Related papers (2022-05-23T11:41:02Z)
Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems [15.401574286479546]
Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard. We propose a novel architecture to facilitate it for multiple languages while using data less than 3% of the size of the data used by the state of the art results on English. We publish the first results on TN for TTS in Spanish and Tamil and also demonstrate that the performance of the approach is comparable with the previous work done on English.
arXiv Detail & Related papers (2021-04-15T21:14:28Z)
Neural Inverse Text Normalization [11.240669509034298]
We propose an efficient and robust neural solution for inverse text normalization. We show that this can be easily extended to other languages without the need for a linguistic expert to manually curate them. A transformer based model infused with pretraining consistently achieves a lower WER across several datasets.
arXiv Detail & Related papers (2021-02-12T07:53:53Z)
BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity. Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset. We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity [3.8673630752805432]
We present DataTuner, a neural, end-to-end data-to-text generation system. We take a two-stage generation-reranking approach, combining a fine-tuned language model with a semantic fidelity. We show that DataTuner achieves state of the art results on the automated metrics across four major D2T datasets.
arXiv Detail & Related papers (2020-04-08T11:16:53Z)
Few-shot Natural Language Generation for Task-Oriented Dialog [113.07438787659859]
We present FewShotWoz, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems. We develop the SC-GPT model, which is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability. Experiments on FewShotWoz and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods.
arXiv Detail & Related papers (2020-02-27T18:48:33Z)
Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task. Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.