A New Dataset and Empirical Study for Sentence Simplification in Chinese
- URL: http://arxiv.org/abs/2306.04188v1
- Date: Wed, 7 Jun 2023 06:47:34 GMT
- Title: A New Dataset and Empirical Study for Sentence Simplification in Chinese
- Authors: Shiping Yang and Renliang Sun and Xiaojun Wan
- Abstract summary: This paper introduces CSS, a new dataset for assessing sentence simplification in Chinese.
We collect manual simplifications from human annotators and perform data analysis to show the difference between English and Chinese sentence simplifications.
In the end, we explore whether Large Language Models can serve as high-quality Chinese sentence simplification systems by evaluating them on CSS.
- Score: 50.0624778757462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentence Simplification is a valuable technique that can benefit language
learners and children a lot. However, current research focuses more on English
sentence simplification. The development of Chinese sentence simplification is
relatively slow due to the lack of data. To alleviate this limitation, this
paper introduces CSS, a new dataset for assessing sentence simplification in
Chinese. We collect manual simplifications from human annotators and perform
data analysis to show the difference between English and Chinese sentence
simplifications. Furthermore, we test several unsupervised and zero/few-shot
learning methods on CSS and analyze the automatic evaluation and human
evaluation results. In the end, we explore whether Large Language Models can
serve as high-quality Chinese sentence simplification systems by evaluating
them on CSS.
Related papers
- Difficulty Estimation and Simplification of French Text Using LLMs [1.0568851068989973]
We leverage large language models for language learning applications, focusing on estimating the difficulty of foreign language texts.
We develop a difficulty classification model using labeled examples, transfer learning, and large language models, demonstrating superior accuracy compared to previous approaches.
Our experiments are conducted on French texts, but our methods are language-agnostic and directly applicable to other foreign languages.
arXiv Detail & Related papers (2024-07-25T14:16:08Z) - MCTS: A Multi-Reference Chinese Text Simplification Dataset [15.080614581458091]
There has been very little research on Chinese text simplification for a long time.
We introduce MCTS, a multi-reference Chinese text simplification dataset.
We evaluate the performance of several unsupervised methods and advanced large language models.
arXiv Detail & Related papers (2023-06-05T11:46:36Z) - Multilingual Simplification of Medical Texts [49.469685530201716]
We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages.
We evaluate fine-tuned and zero-shot models across these languages, with extensive human assessments and analyses.
Although models can now generate viable simplified texts, we identify outstanding challenges that this dataset might be used to address.
arXiv Detail & Related papers (2023-05-21T18:25:07Z) - Elaborative Simplification as Implicit Questions Under Discussion [51.17933943734872]
This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework.
We show that explicitly modeling QUD provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse.
arXiv Detail & Related papers (2023-05-17T17:26:16Z) - Sentence Simplification via Large Language Models [15.07021692249856]
Sentence Simplification aims to rephrase complex sentences into simpler sentences while retaining original meaning.
Large Language models (LLMs) have demonstrated the ability to perform a variety of natural language processing tasks.
arXiv Detail & Related papers (2023-02-23T12:11:58Z) - Exploiting Summarization Data to Help Text Simplification [50.0624778757462]
We analyzed the similarity between text summarization and text simplification and exploited summarization data to help simplify.
We named these pairs Sum4Simp (S4S) and conducted human evaluations to show that S4S is high-quality.
arXiv Detail & Related papers (2023-02-14T15:32:04Z) - Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification.
Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia.
We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z) - Chinese Lexical Simplification [29.464388721085548]
There is no research work for Chinese lexical simplification ( CLS) task.
To circumvent difficulties in acquiring annotations, we manually create the first benchmark dataset for CLS.
We present five different types of methods as baselines to generate substitute candidates for the complex word.
arXiv Detail & Related papers (2020-10-14T12:55:36Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z) - MUSS: Multilingual Unsupervised Sentence Simplification by Mining
Paraphrases [20.84836431084352]
We introduce MUSS, a Multilingual Unsupervised Sentence Simplification system that does not require labeled simplification data.
MUSS uses a novel approach to sentence simplification that trains strong models using sentence-level paraphrase data instead of proper simplification data.
We evaluate our approach on English, French, and Spanish simplification benchmarks and closely match or outperform the previous best supervised results.
arXiv Detail & Related papers (2020-05-01T12:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.