A Survey on Importance of Homophones Spelling Correction Model for Khmer Authors
- URL: http://arxiv.org/abs/2411.10477v1
- Date: Mon, 11 Nov 2024 10:07:03 GMT
- Title: A Survey on Importance of Homophones Spelling Correction Model for Khmer Authors
- Authors: Seanghort Born, Madeth May, Claudine Piau-Toffolon, Sébastien Iksal,
- Abstract summary: Homophones present a significant challenge to authors in any languages due to their similarities of pronunciations but different meanings and spellings.
This research aims to address the difficulties faced by Khmer authors when using homophones in their writing.
- Score: 0.0
- License:
- Abstract: Homophones present a significant challenge to authors in any languages due to their similarities of pronunciations but different meanings and spellings. This issue is particularly pronounced in the Khmer language, rich in homophones due to its complex structure and extensive character set. This research aims to address the difficulties faced by Khmer authors when using homophones in their writing and proposes potential solutions based on an extensive literature review and survey analysis. A survey of 108 Khmer native speakers, including students, employees, and professionals, revealed that many frequently encounter challenges with homophones in their writing, often struggling to choose the correct word based on context. The survey also highlighted the absence of effective tools to address homophone errors in Khmer, which complicates the writing process. Additionally, a review of existing studies on spelling correction in other languages, such as English, Azerbaijani, and Bangla, identified a lack of research focused specifically on homophones, particularly in the Khmer language. In summary, this research highlights the necessity for a specialized tool to address Khmer homophone errors. By bridging current gaps in research and available resources, such a tool would enhance the confidence and accuracy of Khmer authors in their writing, thereby contributing to the enrichment and preservation of the language. Continued efforts in this domain are essential for ensuring that Khmer can leverage advancements in technology and linguistics effectively.
Related papers
- Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Improving Rare Words Recognition through Homophone Extension and Unified
Writing for Low-resource Cantonese Speech Recognition [36.10245119706219]
Homophone characters are common in tonal syllable-based languages, such as Mandarin and Cantonese.
This paper presents a novel homophone extension method to integrate human knowledge of the homophone lexicon into the beam search decoding process.
We also propose an automatic unified writing method to merge the variants of Cantonese characters.
arXiv Detail & Related papers (2023-02-02T02:46:32Z) - Accuracy of the Uzbek stop words detection: a case study on "School
corpus" [0.0]
We present a method to evaluate the quality of a list of stop words aimed at automatically creating techniques.
The method was tested on an automatically-generated list of stop words for the Uzbek language.
arXiv Detail & Related papers (2022-09-15T05:14:31Z) - Khmer Word Search: Challenges, Solutions, and Semantic-Aware Search [0.0]
Multiple orders of characters and different spelling realizations of words impose a constraint on Khmer word search functionality.
Spelling mistakes are common since robust spellcheckers are not commonly available across the input device platforms.
The proposed solutions include character order normalization, grapheme and phoneme-based spellcheckers, and Khmer word semantic model.
arXiv Detail & Related papers (2021-12-16T14:37:41Z) - Phoneme Recognition through Fine Tuning of Phonetic Representations: a
Case Study on Luhya Language Varieties [77.2347265289855]
We focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation.
To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda.
We find that fine-tuning of Allosaurus, even with just 100 utterances, leads to significant improvements in phone error rates.
arXiv Detail & Related papers (2021-04-04T15:07:55Z) - Deciphering Undersegmented Ancient Scripts Using Phonetic Prior [31.707254394215283]
Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges.
We propose a model that handles both of these challenges by building on rich linguistic constraints.
We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian)
arXiv Detail & Related papers (2020-10-21T15:03:52Z) - A Multitask Learning Approach for Diacritic Restoration [21.288912928687186]
In many languages like Arabic, diacritics are used to specify pronunciations as well as meanings.
Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word.
We use Arabic as a case study since it has sufficient data resources for tasks that we consider in our joint modeling.
arXiv Detail & Related papers (2020-06-07T01:20:40Z) - A Corpus for Large-Scale Phonetic Typology [112.19288631037055]
We present VoxClamantis v1.0, the first large-scale corpus for phonetic typology.
aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic measures of vowels and sibilants.
arXiv Detail & Related papers (2020-05-28T13:03:51Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.