LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study
- URL: http://arxiv.org/abs/2409.08554v1
- Date: Fri, 13 Sep 2024 06:13:55 GMT
- Title: LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study
- Authors: Mahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee,
- Abstract summary: Grapheme-to-phoneme (G2P) conversion is critical in speech processing.
Large language models (LLMs) have recently demonstrated significant potential in various language tasks.
We present a benchmarking dataset designed to assess G2P performance on sentence-level phonetic challenges of the Persian language.
- Score: 2.8948274245812327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Grapheme-to-phoneme (G2P) conversion is critical in speech processing, particularly for applications like speech synthesis. G2P systems must possess linguistic understanding and contextual awareness of languages with polyphone words and context-dependent phonemes. Large language models (LLMs) have recently demonstrated significant potential in various language tasks, suggesting that their phonetic knowledge could be leveraged for G2P. In this paper, we evaluate the performance of LLMs in G2P conversion and introduce prompting and post-processing methods that enhance LLM outputs without additional training or labeled data. We also present a benchmarking dataset designed to assess G2P performance on sentence-level phonetic challenges of the Persian language. Our results show that by applying the proposed methods, LLMs can outperform traditional G2P tools, even in an underrepresented language like Persian, highlighting the potential of developing LLM-aided G2P systems.
Related papers
- Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance [67.26434607115392]
Large vision-language models (LVLMs) have achieved impressive results in various vision-language tasks.
LVLMs suffer from hallucinations caused by language bias, leading to diminished focus on images and ineffective visual comprehension.
We propose LACING to address the language bias of LVLMs with muLtimodal duAl-attention meChanIsm (MDA) aNd soft-image Guidance (IFG)
arXiv Detail & Related papers (2024-11-21T16:33:30Z) - Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models [74.71484979138161]
Grapheme-to-phoneme (G2P) conversion is a crucial step in Text-to-Speech (TTS) systems.
Inspired by the success of Large Language Models (LLMs) in handling context-aware scenarios, contextual G2P conversion systems are proposed.
The efficacy of incorporating ICKR into G2P conversion systems is demonstrated thoroughly on the Librig2p dataset.
arXiv Detail & Related papers (2024-11-12T05:38:43Z) - PhonologyBench: Evaluating Phonological Skills of Large Language Models [57.80997670335227]
Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research.
We present PhonologyBench, a novel benchmark consisting of three diagnostic tasks designed to explicitly test the phonological skills of LLMs.
We observe a significant gap of 17% and 45% on Rhyme Word Generation and Syllable counting, respectively, when compared to humans.
arXiv Detail & Related papers (2024-04-03T04:53:14Z) - Toward Informal Language Processing: Knowledge of Slang in Large Language Models [16.42982896928428]
We construct a dataset that supports evaluation on a diverse set of tasks pertaining to automatic processing of slang.
For both evaluation and finetuning, we show the effectiveness of our dataset on two core applications.
We find that while LLMs such as GPT-4 achieve good performance in a zero-shot setting, smaller BERT-like models finetuned on our dataset achieve comparable performance.
arXiv Detail & Related papers (2024-04-02T21:50:18Z) - Native Language Identification with Large Language Models [60.80452362519818]
We show that GPT models are proficient at NLI classification, with GPT-4 setting a new performance record of 91.7% on the benchmark11 test set in a zero-shot setting.
We also show that unlike previous fully-supervised settings, LLMs can perform NLI without being limited to a set of known classes.
arXiv Detail & Related papers (2023-12-13T00:52:15Z) - AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models.
It achieves consistent and correct step-wise prompts in zero-shot scenarios.
We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z) - Improving grapheme-to-phoneme conversion by learning pronunciations from
speech recordings [12.669655363646257]
The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation.
We propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings.
arXiv Detail & Related papers (2023-07-31T13:25:38Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme
Conversion by Controlled noise introducing and Contextual information
incorporation [32.75866643254402]
We show that neural G2P models are extremely sensitive to orthographical variations in graphemes like spelling mistakes.
We propose three controlled noise introducing methods to synthesize noisy training data.
We incorporate the contextual information with the baseline and propose a robust training strategy to stabilize the training process.
arXiv Detail & Related papers (2022-02-21T13:29:30Z) - RECOApy: Data recording, pre-processing and phonetic transcription for
end-to-end speech-based applications [4.619541348328938]
RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications.
The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming.
The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from the Wiktionary online collaborative resource.
arXiv Detail & Related papers (2020-09-11T15:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.