PERCORE: A Deep Learning-Based Framework for Persian Spelling Correction with Phonetic Analysis
- URL: http://arxiv.org/abs/2407.14789v1
- Date: Sat, 20 Jul 2024 07:41:04 GMT
- Title: PERCORE: A Deep Learning-Based Framework for Persian Spelling Correction with Phonetic Analysis
- Authors: Seyed Mohammad Sadegh Dashti, Amid Khatibi Bardsiri, Mehdi Jafari Shahbazzadeh,
- Abstract summary: This research introduces a state-of-the-art Persian spelling correction system that seamlessly integrates deep learning techniques with phonetic analysis.
Our methodology effectively combines deep contextual analysis with phonetic insights, adeptly correcting both non-word and real-word spelling errors.
A thorough evaluation on a wide-ranging dataset confirms our system's superior performance compared to existing methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This research introduces a state-of-the-art Persian spelling correction system that seamlessly integrates deep learning techniques with phonetic analysis, significantly enhancing the accuracy and efficiency of natural language processing (NLP) for Persian. Utilizing a fine-tuned language representation model, our methodology effectively combines deep contextual analysis with phonetic insights, adeptly correcting both non-word and real-word spelling errors. This strategy proves particularly effective in tackling the unique complexities of Persian spelling, including its elaborate morphology and the challenge of homophony. A thorough evaluation on a wide-ranging dataset confirms our system's superior performance compared to existing methods, with impressive F1-Scores of 0.890 for detecting real-word errors and 0.905 for correcting them. Additionally, the system demonstrates a strong capability in non-word error correction, achieving an F1-Score of 0.891. These results illustrate the significant benefits of incorporating phonetic insights into deep learning models for spelling correction. Our contributions not only advance Persian language processing by providing a versatile solution for a variety of NLP applications but also pave the way for future research in the field, emphasizing the critical role of phonetic analysis in developing effective spelling correction system.
Related papers
- Improving the quality of Persian clinical text with a novel spelling correction system [0.0]
The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor for efficient clinical care, research, and ensuring patient safety.
This research aimed to develop an innovative approach for detecting and correcting spelling errors in Persian clinical text.
arXiv Detail & Related papers (2024-08-07T08:31:42Z) - Automatic Real-word Error Correction in Persian Text [0.0]
This paper introduces a cutting-edge approach for precise and efficient real-word error correction in Persian text.
We employ semantic analysis, feature selection, and advanced classifiers to enhance error detection and correction efficacy.
Our method achieves an impressive F-measure of 96.6% in the detection phase and an accuracy of 99.1% in the correction phase.
arXiv Detail & Related papers (2024-07-20T07:50:52Z) - Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Improving Language Models Meaning Understanding and Consistency by
Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness.
A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results.
We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Persian Typographical Error Type Detection Using Deep Neural Networks on Algorithmically-Generated Misspellings [2.2503811834154104]
Typographical Error Type Detection in Persian is a relatively understudied area.
This paper presents a compelling approach for detecting typographical errors in Persian texts.
The outcomes of our final method proved to be highly competitive, achieving an accuracy of 97.62%, precision of 98.83%, recall of 98.61%, and surpassing others in terms of speed.
arXiv Detail & Related papers (2023-05-19T15:05:39Z) - A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages [2.5874041837241304]
Spelling error correction is the task of identifying and rectifying misspelled words in texts.
Earlier efforts on spelling error correction in Bangla and resource-scarce Indic languages focused on rule-based, statistical, and machine learning-based methods.
We propose a novel detector-purificator-corrector, DPC based on denoising transformers by addressing previous issues.
arXiv Detail & Related papers (2022-11-07T17:59:05Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - An Approach to Mispronunciation Detection and Diagnosis with Acoustic,
Phonetic and Linguistic (APL) Embeddings [18.282632348274756]
Phonetic embeddings, extracted from ASR models trained with huge amount of word level annotations, can serve as a good representation of the content of input speech.
We propose to utilize Acoustic, Phonetic and Linguistic (APL) embedding features jointly for building a more powerful MD&D system.
arXiv Detail & Related papers (2021-10-14T11:25:02Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.