External Knowledge Augmented Polyphone Disambiguation Using Large
Language Model
- URL: http://arxiv.org/abs/2312.11920v1
- Date: Tue, 19 Dec 2023 08:00:10 GMT
- Title: External Knowledge Augmented Polyphone Disambiguation Using Large
Language Model
- Authors: Chen Li
- Abstract summary: We introduce a novel method to solve the problem as a generation task.
Retrieval module incorporates external knowledge which is a multi-level semantic dictionary of Chinese polyphonic characters.
Generation module adopts the decoder-only Transformer architecture to induce the target text.
Postprocess module corrects the generated text into a valid result if needed.
- Score: 3.372242769313867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key issues in Mandarin Chinese text-to-speech (TTS) systems is
polyphone disambiguation when doing grapheme-to-phoneme (G2P) conversion. In
this paper, we introduce a novel method to solve the problem as a generation
task. Following the trending research of large language models (LLM) and prompt
learning, the proposed method consists of three modules. Retrieval module
incorporates external knowledge which is a multi-level semantic dictionary of
Chinese polyphonic characters to format the sentence into a prompt. Generation
module adopts the decoder-only Transformer architecture to induce the target
text. Postprocess module corrects the generated text into a valid result if
needed. Experimental results show that our method outperforms the existing
methods on a public dataset called CPP. We also empirically study the impacts
of different templates of the prompt, different sizes of training data, and
whether to incorporate external knowledge.
Related papers
- Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model [25.459787361454353]
We present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generating knowledge-intensive multi-turn dialogues for instruction tuning.
By integrating raw documents from both open-source datasets and domain-specific web-crawled documents into a benchmark K-BENCH, we cover diverse areas such as Wikipedia (English), Science (Chinese), and Artifacts (Chinese)
arXiv Detail & Related papers (2024-07-03T12:04:10Z) - Multi-Modal Retrieval For Large Language Model Based Speech Recognition [15.494654232953678]
We propose multi-modal retrieval with two approaches: kNN-LM and cross-attention techniques.
We show that speech-based multi-modal retrieval outperforms text based retrieval.
We achieve state-of-the-art recognition results on the Spoken-Squad question answering dataset.
arXiv Detail & Related papers (2024-06-13T22:55:22Z) - Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies.
We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z) - Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation [16.49758711633611]
Large Language Models (LLMs) have shown exceptional language generation capabilities in response to text-based prompts.
In this work, we explore the use of Prompt Tuning to achieve controlled language generation.
We demonstrate the efficacy of our method towards mitigating harmful, toxic, and biased text generated by language models.
arXiv Detail & Related papers (2024-04-08T01:54:28Z) - Unsupervised Sign Language Translation and Generation [72.01216288379072]
We introduce an unsupervised sign language translation and generation network (USLNet)
USLNet learns from abundant single-modality (text and video) data without parallel sign language data.
We propose a sliding window method to address the issues of aligning variable-length text with video sequences.
arXiv Detail & Related papers (2024-02-12T15:39:05Z) - Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone
Disambiguation [35.35236347070773]
We build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text.
We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity.
arXiv Detail & Related papers (2022-11-17T12:37:41Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Pretrained Language Models for Dialogue Generation with Multiple Input
Sources [101.17537614998805]
In this work, we study dialogue models with multiple input sources adapted from the pretrained language model GPT2.
We explore various methods to fuse multiple separate attention information corresponding to different sources.
Our experimental results show that proper fusion methods deliver higher relevance with dialogue history than simple fusion baselines.
arXiv Detail & Related papers (2020-10-15T07:53:28Z) - SPLAT: Speech-Language Joint Pre-Training for Spoken Language
Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions.
Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text.
We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.