DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing
- URL: http://arxiv.org/abs/2402.16733v1
- Date: Wed, 21 Feb 2024 09:12:16 GMT
- Title: DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing
- Authors: Haneul Yoo, Jieun Han, So-Yeon Ahn, Alice Oh
- Abstract summary: We release DREsS, a large-scale, standard dataset for rubric-based automated essay scoring.
DREsS comprises three sub-datasets: DREsS_New, DREsS_Std., and DREsS_CASE.
- Score: 16.76905904995145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated essay scoring (AES) is a useful tool in English as a Foreign
Language (EFL) writing education, offering real-time essay scores for students
and instructors. However, previous AES models were trained on essays and scores
irrelevant to the practical scenarios of EFL writing education and usually
provided a single holistic score due to the lack of appropriate datasets. In
this paper, we release DREsS, a large-scale, standard dataset for rubric-based
automated essay scoring. DREsS comprises three sub-datasets: DREsS_New,
DREsS_Std., and DREsS_CASE. We collect DREsS_New, a real-classroom dataset with
1.7K essays authored by EFL undergraduate students and scored by English
education experts. We also standardize existing rubric-based essay scoring
datasets as DREsS_Std. We suggest CASE, a corruption-based augmentation
strategy for essays, which generates 20K synthetic samples of DREsS_CASE and
improves the baseline results by 45.44%. DREsS will enable further research to
provide a more accurate and practical AES system for EFL writing education.
Related papers
- Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection [10.198081881605226]
Automated Essay Scoring (AES) plays a crucial role in assessing language learners' writing quality, reducing grading workload, and providing real-time feedback.
This paper presents a novel framework leveraging Large Language Models (LLMs) and Transformers to generate synthetic Arabic essay datasets for AES.
Our approach produces realistic human-like essays, contributing a dataset of 3,040 annotated essays.
arXiv Detail & Related papers (2025-03-22T11:54:10Z) - VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models [5.713983191152314]
VTechAGP is the first academic-to-general-audience text paraphrase dataset.
We also propose a novel dynamic soft prompt generative language model, DSPT5.
For training, we leverage a contrastive-generative loss function to learn the keyword in the dynamic prompt.
arXiv Detail & Related papers (2024-11-07T16:06:00Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing [82.33075210051129]
We introduce AceParse, the first comprehensive dataset designed to support the parsing of structured texts.
Based on AceParse, we fine-tuned a multimodal model, named Ace, which accurately parses various structured texts.
This model outperforms the previous state-of-the-art by 4.1% in terms of F1 score and by 5% in Jaccard Similarity.
arXiv Detail & Related papers (2024-09-16T06:06:34Z) - "I understand why I got this grade": Automatic Short Answer Grading with Feedback [36.74896284581596]
We present a dataset of 5.8k student answers accompanied by reference answers and questions for the Automatic Short Answer Grading (ASAG) task.
The EngSAF dataset is meticulously curated to cover a diverse range of subjects, questions, and answer patterns from multiple engineering domains.
arXiv Detail & Related papers (2024-06-30T15:42:18Z) - Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression [27.152245569974678]
We develop two models that automatically score English essays across multiple dimensions.
Our systems achieve impressive performance in evaluation using three criteria: precision, F1 score, and Quadratic Weighted Kappa.
arXiv Detail & Related papers (2024-06-03T10:59:50Z) - RECIPE4U: Student-ChatGPT Interaction Dataset in EFL Writing Education [15.253081304714101]
We present RECIPE4U, a dataset sourced from a semester-long experiment with 212 college students in English as Foreign Language (EFL) writing courses.
During the study, students engaged in dialogues with ChatGPT to revise their essays. RECIPE4U includes comprehensive records of these interactions, including conversation logs, students' intent, students' self-rated satisfaction, and students' essay edit histories.
arXiv Detail & Related papers (2024-03-13T05:51:57Z) - Empirical Study of Large Language Models as Automated Essay Scoring
Tools in English Composition__Taking TOEFL Independent Writing Task for
Example [25.220438332156114]
This study aims to assess the capabilities and constraints of ChatGPT, a prominent representative of large language models.
This study employs ChatGPT to conduct an automated evaluation of English essays, even with a small sample size.
arXiv Detail & Related papers (2024-01-07T07:13:50Z) - A Benchmark for Text Expansion: Datasets, Metrics, and Baselines [87.47745669317894]
This work presents a new task of Text Expansion (TE), which aims to insert fine-grained modifier into proper locations of the plain text.
We leverage four complementary approaches to construct a dataset with 12 million automatically generated instances and 2K human-annotated references.
On top of a pre-trained text-infilling model, we build both pipelined and joint Locate&Infill models, which demonstrate the superiority over the Text2Text baselines.
arXiv Detail & Related papers (2023-09-17T07:54:38Z) - ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models [82.63962107729994]
Any-Shot Data-to-Text (ASDOT) is a new approach flexibly applicable to diverse settings.
It consists of two steps, data disambiguation and sentence fusion.
Experimental results show that ASDOT consistently achieves significant improvement over baselines.
arXiv Detail & Related papers (2022-10-09T19:17:43Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text.
Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.