Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model
- URL: http://arxiv.org/abs/2502.01090v1
- Date: Mon, 03 Feb 2025 06:23:35 GMT
- Title: Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model
- Authors: Jiali Chen, Xusen Hei, Yuqi Xue, Zihan Wu, Jiayuan Xie, Yi Cai,
- Abstract summary: Chinese literary classics hold significant cultural and educational value.
These works often include classical Chinese and complex narratives, making them difficult for children to read.
We introduce a child-friendly literary adaptation task to adapt the Chinese literary classic into engaging and accessible text for children.
- Score: 9.814667586928246
- License:
- Abstract: Chinese literary classics hold significant cultural and educational value, offering deep insights into morality, history, and human nature. These works often include classical Chinese and complex narratives, making them difficult for children to read. To bridge this gap, we introduce a child-friendly literary adaptation (CLA) task to adapt the Chinese literary classic into engaging and accessible text for children. However, recent large language models (LLMs) overlook children's reading preferences (\ie, vivid character portrayals, concise narrative structures, and appropriate readability), which poses challenges in CLA. In this paper, we propose a method called InstructChild, which augments the LLM with these preferences for adaptation. Specifically, we first obtain the characters' personalities and narrative structure as additional information for fine-grained instruction tuning. Then, we devise a readability metric as the reward to align the LLM with the children's reading level. Finally, a lookahead decoding strategy is applied to improve the readability of the generated text during inference. To support the evaluation of CLA task, we construct the Classic4Children dataset, which comprises both the original and child-friendly versions of the Four Great Classical Novels of Chinese literature. Experimental results show that our InstructChild significantly improves automatic and human evaluation performance.
Related papers
- Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs [50.0874045899661]
We introduce CharacterBot, a model designed to replicate both the linguistic patterns and distinctive thought processes of a character.
Using Lu Xun as a case study, we propose four training tasks derived from his 17 essay collections.
These include a pre-training task focused on mastering external linguistic structures and knowledge, as well as three fine-tuning tasks.
We evaluate CharacterBot on three tasks for linguistic accuracy and opinion comprehension, demonstrating that it significantly outperforms the baselines on our adapted metrics.
arXiv Detail & Related papers (2025-02-18T16:11:54Z) - "Once Upon a Time..." Literary Narrative Connectedness Progresses with Grade Level: Potential Impact on Reading Fluency and Literacy Skills [0.0]
This study explores the narrative dynamics of literary texts used in schools.
We examined a dataset of 1,627 literary texts spanning 13 years of education.
arXiv Detail & Related papers (2025-02-10T22:21:29Z) - Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving [43.148203559785095]
Large language models (LLMs) with impressive multilingual capabilities may bring a ray of hope to achieve this extreme translation demand.
This paper first introduces a suitable benchmark (PoetMT) where each Chinese poetry has a recognized elegant translation.
We propose a new metric based on GPT-4 to evaluate the extent to which current LLMs can meet these demands.
arXiv Detail & Related papers (2024-08-19T12:34:31Z) - Are Large Language Models Capable of Generating Human-Level Narratives? [114.34140090869175]
This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression.
We introduce a novel computational framework to analyze narratives through three discourse-level aspects.
We show that explicit integration of discourse features can enhance storytelling, as is demonstrated by over 40% improvement in neural storytelling.
arXiv Detail & Related papers (2024-07-18T08:02:49Z) - LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z) - On the Automatic Generation and Simplification of Children's Stories [14.465545222216749]
We first examine the ability of several popular large language models to generate stories with properly adjusted lexical and readability levels.
As a second experiment, we explore the ability of state-of-the-art lexical simplification models to generalize to the domain of children's stories.
We find that, while the strongest-performing current lexical simplification models do not perform as well on material designed for children due to their reliance on large language models behind the scenes.
arXiv Detail & Related papers (2023-10-27T21:31:34Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-Centric Summarization [67.1483219601714]
We propose a novel question generation method that first learns the question type distribution of an input story paragraph.
We finetune a pre-trained transformer-based sequence-to-sequence model using silver samples composed by educational question-answer pairs.
Our work indicates the necessity of decomposing question type distribution learning and event-centric summary generation for educational question generation.
arXiv Detail & Related papers (2022-03-27T02:21:19Z) - Application of Lexical Features Towards Improvement of Filipino
Readability Identification of Children's Literature [0.0]
We explore the use of lexical features towards improving readability identification of children's books written in Filipino.
Results show that combining lexical features (LEX) consisting of type-token ratio, lexical density, lexical variation, foreign word count with traditional features (TRAD) increased the performance of readability models by almost a 5% margin.
arXiv Detail & Related papers (2021-01-22T19:54:37Z) - Interactive Fiction Game Playing as Multi-Paragraph Reading
Comprehension with Reinforcement Learning [94.50608198582636]
Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques.
We take a novel perspective of IF game solving and re-formulate it as Multi-Passage Reading (MPRC) tasks.
arXiv Detail & Related papers (2020-10-05T23:09:20Z) - A Comparative Study of Feature Types for Age-Based Text Classification [3.867363075280544]
We compare the effectiveness of various types of linguistic features for the task of age-based classification of fiction texts.
The results obtained show that the features describing the text at the document level can significantly increase the quality of machine learning models.
arXiv Detail & Related papers (2020-09-24T18:41:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.