Related papers: Calibrating Generative AI to Produce Realistic Essays for Data Augmentation

Calibrating Generative AI to Produce Realistic Essays for Data Augmentation

URL: http://arxiv.org/abs/2602.06772v1
Date: Fri, 06 Feb 2026 15:27:57 GMT
Title: Calibrating Generative AI to Produce Realistic Essays for Data Augmentation
Authors: Edward W. Wolfe, Justin O. Barber,
Abstract summary: Data augmentation can mitigate limited training data in machine-learning automated scoring engines for constructed response items.<n>This study seeks to determine how well three approaches to large language model prompting produce essays that preserve the writing quality of the original essays and produce realistic text for augmenting training datasets.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data augmentation can mitigate limited training data in machine-learning automated scoring engines for constructed response items. This study seeks to determine how well three approaches to large language model prompting produce essays that preserve the writing quality of the original essays and produce realistic text for augmenting ASE training datasets. We created simulated versions of student essays, and human raters assigned scores to them and rated the realism of the generated text. The results of the study indicate that the predict next prompting strategy produces the highest level of agreement between human raters regarding simulated essay scores, predict next and sentence strategies best preserve the rated quality of the original essay in the simulated essays, and predict next and 25 examples strategies produce the most realistic text as judged by human raters.

Related papers

Enhancing Essay Cohesion Assessment: A Novel Item Response Theory Approach [0.7845950813414773]
This work proposes and analyses the performance of a cohesion score prediction approach based on item response theory.<n>The proposed approach outperforms conventional machine learning models and ensemble methods in several evaluation metrics.
arXiv Detail & Related papers (2025-07-11T11:05:27Z)
Machine-assisted writing evaluation: Exploring pre-trained language models in analyzing argumentative moves [28.01557438111706]
The study investigates the efficacy of pre-trained language models (PLMs) in analyzing argumentative moves in a longitudinal learner corpus.<n>A longitudinal corpus of 1643 argumentative texts from 235 English learners in China is collected and annotated into six move types.<n>The results indicate a robust reliability of PLMs in analyzing argumentative moves, with an overall F1 score of 0.743, surpassing existing models in the field.
arXiv Detail & Related papers (2025-03-25T02:21:12Z)
SimOAP: Improve Coherence and Consistency in Persona-based Dialogue Generation via Over-sampling and Post-evaluation [54.66399120084227]
Language models trained on large-scale corpora can generate remarkably fluent results in open-domain dialogue. For the persona-based dialogue generation task, consistency and coherence are great challenges for language models. A two-stage SimOAP strategy is proposed, i.e., over-sampling and post-evaluation.
arXiv Detail & Related papers (2023-05-18T17:23:00Z)
Large Language Models are Diverse Role-Players for Summarization Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal. Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions. We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z)
Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling [23.317845744611375]
We propose to minimize the mutual information between the semantics in the generated text sentences and their demographic polarity. In this way, the mentioning of a demographic group is encouraged to be independent from how it is described in the generated text. We also propose a distillation mechanism that preserves the language modeling ability of the PLMs after debiasing.
arXiv Detail & Related papers (2023-02-25T18:29:02Z)
MOCHA: A Multi-Task Training Approach for Coherent Text Generation from Cognitive Perspective [22.69509556890676]
We propose a novel multi-task training strategy for coherent text generation grounded on the cognitive theory of writing. We extensively evaluate our model on three open-ended generation tasks including story generation, news article writing and argument generation.
arXiv Detail & Related papers (2022-10-26T11:55:41Z)
Analyzing and Evaluating Faithfulness in Dialogue Summarization [67.07947198421421]
We first perform the fine-grained human analysis on the faithfulness of dialogue summaries and observe that over 35% of generated summaries are faithfully inconsistent respective the source dialogues. We present a new model-level faithfulness evaluation method. It examines generation models with multi-choice questions created by rule-based transformations.
arXiv Detail & Related papers (2022-10-21T07:22:43Z)
Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality. We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z)
Data-to-text Generation with Variational Sequential Planning [74.3955521225497]
We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input. We propose a neural model enhanced with a planning component responsible for organizing high-level information in a coherent and meaningful way. We infer latent plans sequentially with a structured variational model, while interleaving the steps of planning and generation.
arXiv Detail & Related papers (2022-02-28T13:17:59Z)
Fine-tuning GPT-3 for Russian Text Summarization [77.34726150561087]
This paper showcases ruGPT3 ability to summarize texts, fine-tuning it on the corpora of Russian news with their corresponding human-generated summaries. We evaluate the resulting texts with a set of metrics, showing that our solution can surpass the state-of-the-art model's performance without additional changes in architecture or loss function.
arXiv Detail & Related papers (2021-08-07T19:01:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.