Aligning Sentence Simplification with ESL Learner's Proficiency for Language Acquisition
- URL: http://arxiv.org/abs/2502.11457v1
- Date: Mon, 17 Feb 2025 05:32:56 GMT
- Title: Aligning Sentence Simplification with ESL Learner's Proficiency for Language Acquisition
- Authors: Guanlin Li, Yuki Arase, Noel Crespi,
- Abstract summary: This study aims to facilitate English as a Second Language learners' language acquisition by simplification.
We propose simplifying complex sentences to appropriate levels for learners while also increasing vocabulary coverage of the target level in the simplifications.
Our method employs token-level and sentence-level rewards, and iteratively trains the model on its self-generated outputs to guide the model to search for simplification hypotheses that satisfy the target attributes.
- Score: 11.700462697630696
- License:
- Abstract: Text simplification is crucial for improving accessibility and comprehension for English as a Second Language (ESL) learners. This study goes a step further and aims to facilitate ESL learners' language acquisition by simplification. Specifically, we propose simplifying complex sentences to appropriate levels for learners while also increasing vocabulary coverage of the target level in the simplifications. We achieve this without a parallel corpus by conducting reinforcement learning on a large language model. Our method employs token-level and sentence-level rewards, and iteratively trains the model on its self-generated outputs to guide the model to search for simplification hypotheses that satisfy the target attributes. Experiment results on CEFR-SP and TurkCorpus datasets show that the proposed method can effectively increase the frequency and diversity of vocabulary of the target level by more than $20\%$ compared to baseline models, while maintaining high simplification quality.
Related papers
- DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)
We present a simple yet effective automatic process for creating speech-text pair data.
Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z) - Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task.
We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus.
Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z) - A New Dataset and Empirical Study for Sentence Simplification in Chinese [50.0624778757462]
This paper introduces CSS, a new dataset for assessing sentence simplification in Chinese.
We collect manual simplifications from human annotators and perform data analysis to show the difference between English and Chinese sentence simplifications.
In the end, we explore whether Large Language Models can serve as high-quality Chinese sentence simplification systems by evaluating them on CSS.
arXiv Detail & Related papers (2023-06-07T06:47:34Z) - Preference-grounded Token-level Guidance for Language Model Fine-tuning [99.93045967478764]
Aligning language models with preferences is an important problem in natural language generation.
For LM training, based on the amount of supervised data, we present two minimalist learning objectives that utilize the learned guidance.
In experiments, our method performs competitively on two distinct representative LM tasks.
arXiv Detail & Related papers (2023-06-01T07:00:07Z) - Sentence Simplification via Large Language Models [15.07021692249856]
Sentence Simplification aims to rephrase complex sentences into simpler sentences while retaining original meaning.
Large Language models (LLMs) have demonstrated the ability to perform a variety of natural language processing tasks.
arXiv Detail & Related papers (2023-02-23T12:11:58Z) - Automatic Lexical Simplification for Turkish [0.0]
We present the first automatic lexical simplification system for the Turkish language.
Recent text simplification efforts rely on manually crafted simplified corpora and comprehensive NLP tools.
We present a new text simplification pipeline based on pretrained representation model BERT together with morphological features to generate grammatically correct and semantically appropriate word-level simplifications.
arXiv Detail & Related papers (2022-01-15T15:58:44Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - MUSS: Multilingual Unsupervised Sentence Simplification by Mining
Paraphrases [20.84836431084352]
We introduce MUSS, a Multilingual Unsupervised Sentence Simplification system that does not require labeled simplification data.
MUSS uses a novel approach to sentence simplification that trains strong models using sentence-level paraphrase data instead of proper simplification data.
We evaluate our approach on English, French, and Spanish simplification benchmarks and closely match or outperform the previous best supervised results.
arXiv Detail & Related papers (2020-05-01T12:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.