Chinese Lexical Simplification
- URL: http://arxiv.org/abs/2010.07048v1
- Date: Wed, 14 Oct 2020 12:55:36 GMT
- Title: Chinese Lexical Simplification
- Authors: Jipeng Qiang and Xinyu Lu and Yun Li and Yunhao Yuan and Yang Shi and
Xindong Wu
- Abstract summary: There is no research work for Chinese lexical simplification ( CLS) task.
To circumvent difficulties in acquiring annotations, we manually create the first benchmark dataset for CLS.
We present five different types of methods as baselines to generate substitute candidates for the complex word.
- Score: 29.464388721085548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lexical simplification has attracted much attention in many languages, which
is the process of replacing complex words in a given sentence with simpler
alternatives of equivalent meaning. Although the richness of vocabulary in
Chinese makes the text very difficult to read for children and non-native
speakers, there is no research work for Chinese lexical simplification (CLS)
task. To circumvent difficulties in acquiring annotations, we manually create
the first benchmark dataset for CLS, which can be used for evaluating the
lexical simplification systems automatically. In order to acquire more thorough
comparison, we present five different types of methods as baselines to generate
substitute candidates for the complex word that include synonym-based approach,
word embedding-based approach, pretrained language model-based approach,
sememe-based approach, and a hybrid approach. Finally, we design the
experimental evaluation of these baselines and discuss their advantages and
disadvantages. To our best knowledge, this is the first study for CLS task.
Related papers
- A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models [39.35525969831397]
This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task.
Experiments on five public datasets demonstrate that our approach significantly improves LLM performance.
arXiv Detail & Related papers (2024-10-05T04:06:56Z) - A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Multilingual Lexical Simplification via Paraphrase Generation [19.275642346073557]
We propose a novel multilingual LS method via paraphrase generation.
We regard paraphrasing as a zero-shot translation task within multilingual neural machine translation.
Our approach surpasses BERT-based methods and zero-shot GPT3-based method significantly on English, Spanish, and Portuguese.
arXiv Detail & Related papers (2023-07-28T03:47:44Z) - A New Dataset and Empirical Study for Sentence Simplification in Chinese [50.0624778757462]
This paper introduces CSS, a new dataset for assessing sentence simplification in Chinese.
We collect manual simplifications from human annotators and perform data analysis to show the difference between English and Chinese sentence simplifications.
In the end, we explore whether Large Language Models can serve as high-quality Chinese sentence simplification systems by evaluating them on CSS.
arXiv Detail & Related papers (2023-06-07T06:47:34Z) - Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language
Pre-training [50.100992353488174]
We introduce CDBERT, a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters.
We name the two core modules of CDBERT as Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most appropriate meaning from Chinese dictionaries.
Our paradigm demonstrates consistent improvements on previous Chinese PLMs across all tasks.
arXiv Detail & Related papers (2023-05-30T05:48:36Z) - NapSS: Paragraph-level Medical Text Simplification via Narrative
Prompting and Sentence-matching Summarization [46.772517928718216]
We propose a summarize-then-simplify two-stage strategy, which we call NapSS.
NapSS identifies the relevant content to simplify while ensuring that the original narrative flow is preserved.
Our model achieves significantly better than the seq2seq baseline on an English medical corpus.
arXiv Detail & Related papers (2023-02-11T02:20:25Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - Enhancing Pre-trained Language Model with Lexical Simplification [41.34550924004487]
lexical simplification (LS) is a recognized method to reduce such lexical diversity.
We propose a novel approach which can effectively improve the performance of PrLMs in text classification.
arXiv Detail & Related papers (2020-12-30T07:49:00Z) - LSBert: A Simple Framework for Lexical Simplification [32.75631197427934]
We propose a lexical simplification framework LSBert based on pretrained representation model Bert.
We show that our system outputs lexical simplifications that are grammatically correct and semantically appropriate.
arXiv Detail & Related papers (2020-06-25T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.