Multi-Module G2P Converter for Persian Focusing on Relations between
Words
- URL: http://arxiv.org/abs/2208.01371v1
- Date: Tue, 2 Aug 2022 11:33:48 GMT
- Title: Multi-Module G2P Converter for Persian Focusing on Relations between
Words
- Authors: Mahdi Rezaei, Negar Nayeri, Saeed Farzi, Hossein Sameti
- Abstract summary: Our proposed multi- module G2P system outperforms our end-to-end systems in terms of accuracy and speed.
The system is sequence-level rather than word-level, which allows it to effectively capture the unwritten relations between words.
- Score: 1.3764085113103217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the application of end-to-end and multi-module
frameworks for G2P conversion for the Persian language. The results demonstrate
that our proposed multi-module G2P system outperforms our end-to-end systems in
terms of accuracy and speed. The system consists of a pronunciation dictionary
as our look-up table, along with separate models to handle homographs, OOVs and
ezafe in Persian created using GRU and Transformer architectures. The system is
sequence-level rather than word-level, which allows it to effectively capture
the unwritten relations between words (cross-word information) necessary for
homograph disambiguation and ezafe recognition without the need for any
pre-processing. After evaluation, our system achieved a 94.48% word-level
accuracy, outperforming the previous G2P systems for Persian.
Related papers
- Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models [74.71484979138161]
Grapheme-to-phoneme (G2P) conversion is a crucial step in Text-to-Speech (TTS) systems.
Inspired by the success of Large Language Models (LLMs) in handling context-aware scenarios, contextual G2P conversion systems are proposed.
The efficacy of incorporating ICKR into G2P conversion systems is demonstrated thoroughly on the Librig2p dataset.
arXiv Detail & Related papers (2024-11-12T05:38:43Z) - LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study [2.8948274245812327]
Grapheme-to-phoneme (G2P) conversion is critical in speech processing.
Large language models (LLMs) have recently demonstrated significant potential in various language tasks.
We present a benchmarking dataset designed to assess G2P performance on sentence-level phonetic challenges of the Persian language.
arXiv Detail & Related papers (2024-09-13T06:13:55Z) - MetaKP: On-Demand Keyphrase Generation [52.48698290354449]
We introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents.
We present MetaKP, a large-scale benchmark comprising four datasets, 7500 documents, and 3760 goals across news and biomedical domains with human-annotated keyphrases.
We demonstrate the potential of our method to serve as a general NLP infrastructure, exemplified by its application in epidemic event detection from social media.
arXiv Detail & Related papers (2024-06-28T19:02:59Z) - Fine-tuning CLIP Text Encoders with Two-step Paraphrasing [83.3736789315201]
We introduce a straightforward fine-tuning approach to enhance the representations of CLIP models for paraphrases.
Our model, which we call ParaCLIP, exhibits significant improvements over baseline CLIP models across various tasks.
arXiv Detail & Related papers (2024-02-23T06:11:50Z) - Improving grapheme-to-phoneme conversion by learning pronunciations from
speech recordings [12.669655363646257]
The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation.
We propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings.
arXiv Detail & Related papers (2023-07-31T13:25:38Z) - Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme
Conversion [1.5020330976600735]
Most Chinese Grapheme-to-Phoneme (G2P) systems employ a three-stage framework that first transforms input sequences into character embeddings, obtains linguistic information using language models, and then predicts the phonemes based on global context.
We propose the Reinforcer that provides strong inductive bias for language models by emphasizing the phonological information between neighboring characters to help disambiguate pronunciations.
arXiv Detail & Related papers (2023-03-14T09:15:51Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - Diversifying Task-oriented Dialogue Response Generation with Prototype
Guided Paraphrasing [52.71007876803418]
Existing methods for Dialogue Response Generation (DRG) in Task-oriented Dialogue Systems ( TDSs) can be grouped into two categories: template-based and corpus-based.
We propose a prototype-based, paraphrasing neural network, called P2-Net, which aims to enhance quality of the responses in terms of both precision and diversity.
arXiv Detail & Related papers (2020-08-07T22:25:36Z) - Neural Machine Translation for Multilingual Grapheme-to-Phoneme
Conversion [13.543705472805431]
We present a single end-to-end trained neural G2P model that shares same encoder and decoder across multiple languages.
We show 7.2% average improvement in phoneme error rate over low resource languages and no over high resource ones compared to monolingual baselines.
arXiv Detail & Related papers (2020-06-25T06:16:29Z) - K{\o}psala: Transition-Based Graph Parsing via Efficient Training and
Effective Encoding [13.490365811869719]
We present Kopsala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020.
Our system is a pipeline consisting of off-the-shelf models for everything but enhanced parsing, and for the latter, a transition-based graphencies adapted from Che et al.
Our demonstrates that a unified pipeline is effective for both Representation Parsing and Enhanced Universal Dependencies, according to average ELAS.
arXiv Detail & Related papers (2020-05-25T13:17:09Z) - The Paradigm Discovery Problem [121.79963594279893]
We formalize the paradigm discovery problem and develop metrics for judging systems.
We report empirical results on five diverse languages.
Our code and data are available for public use.
arXiv Detail & Related papers (2020-05-04T16:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.