Inaccuracy of an E-Dictionary and Its Influence on Chinese Language Users
- URL: http://arxiv.org/abs/2504.00799v2
- Date: Tue, 29 Apr 2025 17:15:18 GMT
- Title: Inaccuracy of an E-Dictionary and Its Influence on Chinese Language Users
- Authors: Xi Wang, Fanfei Meng, Shiyang Zhang, Lan Li,
- Abstract summary: The accuracy of major E-dictionaries is seldom scrutinized, and little attention has been paid to how their corpora are constructed.<n>This study adopts a combined method of experimentation, user survey, and dictionary critique to examine Youdao, one of the most widely used E-dictionaries in China.<n>Results show that incomplete or misleading definitions can cause serious misunderstandings.<n>The study further explores how such flawed definitions originate, highlighting issues in data processing and the integration of AI and machine learning technologies in dictionary construction.
- Score: 4.061449824145836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Electronic dictionaries have largely replaced paper dictionaries and become central tools for L2 learners seeking to expand their vocabulary. Users often assume these resources are reliable and rarely question the validity of the definitions provided. The accuracy of major E-dictionaries is seldom scrutinized, and little attention has been paid to how their corpora are constructed. Research on dictionary use, particularly the limitations of electronic dictionaries, remains scarce. This study adopts a combined method of experimentation, user survey, and dictionary critique to examine Youdao, one of the most widely used E-dictionaries in China. The experiment involved a translation task paired with retrospective reflection. Participants were asked to translate sentences containing words that are insufficiently or inaccurately defined in Youdao. Their consultation behavior was recorded to analyze how faulty definitions influenced comprehension. Results show that incomplete or misleading definitions can cause serious misunderstandings. Additionally, students exhibited problematic consultation habits. The study further explores how such flawed definitions originate, highlighting issues in data processing and the integration of AI and machine learning technologies in dictionary construction. The findings suggest a need for better training in dictionary literacy for users, as well as improvements in the underlying AI models used to build E-dictionaries.
Related papers
- Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models [49.341876205074]
Direct speech translation (ST) has garnered increasing attention nowadays, yet the accurate translation of terminology within utterances remains a great challenge.<n>We propose a novel Locate-and-Focus method for terminology translation.<n>It first effectively locates the speech clips containing terminologies within the utterance to construct translation knowledge, minimizing irrelevant information for the ST model.
arXiv Detail & Related papers (2025-07-24T10:07:59Z) - Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control [43.860799289234755]
We propose a framework for evaluating feature dictionaries in the context of specific tasks, by comparing them against emphmagnitude feature dictionaries.
First, we demonstrate that supervised dictionaries achieve excellent approximation, control, and interpretability of model computations on the task.
We apply this framework to the indirect object identification (IOI) task using GPT-2 Small, with sparse autoencoders (SAEs) trained on either the IOI or OpenWebText datasets.
arXiv Detail & Related papers (2024-05-14T07:07:13Z) - An Analysis of BPE Vocabulary Trimming in Neural Machine Translation [56.383793805299234]
vocabulary trimming is a postprocessing step that replaces rare subwords with their component subwords.
We show that vocabulary trimming fails to improve performance and is even prone to incurring heavy degradation.
arXiv Detail & Related papers (2024-03-30T15:29:49Z) - Learning Interpretable Queries for Explainable Image Classification with
Information Pursuit [18.089603786027503]
Information Pursuit (IP) is an explainable prediction algorithm that greedily selects a sequence of interpretable queries about the data.
This paper introduces a novel approach: learning a dictionary of interpretable queries directly from the dataset.
arXiv Detail & Related papers (2023-12-16T21:43:07Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - A Survey on Zero Pronoun Translation [69.09774294082965]
Zero pronouns (ZPs) are frequently omitted in pro-drop languages, but should be recalled in non-pro-drop languages.
This survey paper highlights the major works that have been undertaken in zero pronoun translation (ZPT) after the neural revolution.
We uncover a number of insightful findings such as: 1) ZPT is in line with the development trend of large language model; 2) data limitation causes learning bias in languages and domains; 3) performance improvements are often reported on single benchmarks, but advanced methods are still far from real-world use.
arXiv Detail & Related papers (2023-05-17T13:19:01Z) - CSED: A Chinese Semantic Error Diagnosis Corpus [52.92010408053424]
We study the complicated problem of Chinese Semantic Error Diagnosis (CSED), which lacks relevant datasets.
The study of semantic errors is important because they are very common and may lead to syntactic irregularities or even problems of comprehension.
This paper proposes syntax-aware models to specifically adapt to the CSED task.
arXiv Detail & Related papers (2023-05-09T05:33:31Z) - A Study of Slang Representation Methods [3.511369967593153]
We study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding.
Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements.
arXiv Detail & Related papers (2022-12-11T21:56:44Z) - A Linguistic Investigation of Machine Learning based Contradiction
Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features.
The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - Does He Wink or Does He Nod? A Challenging Benchmark for Evaluating Word
Understanding of Language Models [0.6091702876917281]
Recent progress in pretraining language models on large corpora has resulted in large performance gains on many NLP tasks.
To assess what kind of knowledge is acquired, language models are commonly probed by querying them with fill in the blank' style cloze questions.
We introduce WDLMPro to evaluate word understanding directly using dictionary definitions of words.
arXiv Detail & Related papers (2021-02-06T15:15:57Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - The Grievance Dictionary: Understanding Threatening Language Use [0.8373151777137792]
The Grievance Dictionary can be used to automatically understand language use in the context of grievance-fuelled violence threat assessment.
The dictionary was validated by applying it to texts written by violent and non-violent individuals.
arXiv Detail & Related papers (2020-09-10T12:06:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.