Related papers: Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models

Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models

URL: http://arxiv.org/abs/2505.12973v1
Date: Mon, 19 May 2025 11:11:12 GMT
Title: Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models
Authors: Mahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee,
Abstract summary: Homograph disambiguation remains a significant challenge in grapheme-to-phoneme (G2P) conversion.<n>We propose a semi-automated pipeline for constructing homograph-focused datasets, introduce the HomoRich dataset, and demonstrate its effectiveness.<n>We improve one of the most well-known rule-based G2P systems, eSpeak, into a fast homograph-aware version, HomoFast eSpeak.
Score: 2.8948274245812327
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Homograph disambiguation remains a significant challenge in grapheme-to-phoneme (G2P) conversion, especially for low-resource languages. This challenge is twofold: (1) creating balanced and comprehensive homograph datasets is labor-intensive and costly, and (2) specific disambiguation strategies introduce additional latency, making them unsuitable for real-time applications such as screen readers and other accessibility tools. In this paper, we address both issues. First, we propose a semi-automated pipeline for constructing homograph-focused datasets, introduce the HomoRich dataset generated through this pipeline, and demonstrate its effectiveness by applying it to enhance a state-of-the-art deep learning-based G2P system for Persian. Second, we advocate for a paradigm shift - utilizing rich offline datasets to inform the development of fast, rule-based methods suitable for latency-sensitive accessibility applications like screen readers. To this end, we improve one of the most well-known rule-based G2P systems, eSpeak, into a fast homograph-aware version, HomoFast eSpeak. Our results show an approximate 30% improvement in homograph disambiguation accuracy for the deep learning-based and eSpeak systems.

Related papers

Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [75.9865035064794]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z)
Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation [0.0]
This paper introduces an intermediate language specifically designed for Persian language processing.<n>Our methodology combines two key components: Large Language Model (LLM) prompting techniques and a specialized sequence-to-sequence machine transliteration architecture.
arXiv Detail & Related papers (2025-05-10T11:10:48Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models [74.71484979138161]
Grapheme-to-phoneme (G2P) conversion is a crucial step in Text-to-Speech (TTS) systems. Inspired by the success of Large Language Models (LLMs) in handling context-aware scenarios, contextual G2P conversion systems are proposed. The efficacy of incorporating ICKR into G2P conversion systems is demonstrated thoroughly on the Librig2p dataset.
arXiv Detail & Related papers (2024-11-12T05:38:43Z)
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study [2.8948274245812327]
Grapheme-to-phoneme (G2P) conversion is critical in speech processing. Large language models (LLMs) have recently demonstrated significant potential in various language tasks. We present a benchmarking dataset designed to assess G2P performance on sentence-level phonetic challenges of the Persian language.
arXiv Detail & Related papers (2024-09-13T06:13:55Z)
Parameter-Efficient Tuning Large Language Models for Graph Representation Learning [62.26278815157628]
We introduce Graph-aware. Efficient Fine-Tuning - GPEFT, a novel approach for efficient graph representation learning. We use a graph neural network (GNN) to encode structural information from neighboring nodes into a graph prompt. We validate our approach through comprehensive experiments conducted on 8 different text-rich graphs, observing an average improvement of 2% in hit@1 and Mean Reciprocal Rank (MRR) in link prediction evaluations.
arXiv Detail & Related papers (2024-04-28T18:36:59Z)
GNNavi: Navigating the Information Flow in Large Language Models by Graph Neural Network [49.91919718254597]
Large Language Models (LLMs) exhibit strong In-Context Learning capabilities when prompts with demonstrations are used. Prompt-based fine-tuning proves to be an effective fine-tuning method in low-data scenarios, but high demands on computing resources limit its practicality. GNNavi employs a Graph Neural Network layer to precisely guide the aggregation and distribution of information flow during the processing of prompts.
arXiv Detail & Related papers (2024-02-18T21:13:05Z)
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z)
Learning Strong Graph Neural Networks with Weak Information [64.64996100343602]
We develop a principled approach to the problem of graph learning with weak information (GLWI) We propose D$2$PT, a dual-channel GNN framework that performs long-range information propagation on the input graph with incomplete structure, but also on a global graph that encodes global semantic similarities.
arXiv Detail & Related papers (2023-05-29T04:51:09Z)
LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion [18.83348872103488]
Grapheme-to-phoneme (G2P) plays the role of converting letters to their corresponding pronunciations. Existing methods are either slow or poor in performance, and are limited in application scenarios. We propose a novel method named LiteG2P which is fast, light and theoretically parallel.
arXiv Detail & Related papers (2023-03-02T09:16:21Z)
Multi-Module G2P Converter for Persian Focusing on Relations between Words [1.3764085113103217]
Our proposed multi- module G2P system outperforms our end-to-end systems in terms of accuracy and speed. The system is sequence-level rather than word-level, which allows it to effectively capture the unwritten relations between words.
arXiv Detail & Related papers (2022-08-02T11:33:48Z)
r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation [32.75866643254402]
We show that neural G2P models are extremely sensitive to orthographical variations in graphemes like spelling mistakes. We propose three controlled noise introducing methods to synthesize noisy training data. We incorporate the contextual information with the baseline and propose a robust training strategy to stabilize the training process.
arXiv Detail & Related papers (2022-02-21T13:29:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.