DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing
- URL: http://arxiv.org/abs/2402.16733v1
- Date: Wed, 21 Feb 2024 09:12:16 GMT
- Title: DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing
- Authors: Haneul Yoo, Jieun Han, So-Yeon Ahn, Alice Oh
- Abstract summary: We release DREsS, a large-scale, standard dataset for rubric-based automated essay scoring.
DREsS comprises three sub-datasets: DREsS_New, DREsS_Std., and DREsS_CASE.
- Score: 16.76905904995145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated essay scoring (AES) is a useful tool in English as a Foreign
Language (EFL) writing education, offering real-time essay scores for students
and instructors. However, previous AES models were trained on essays and scores
irrelevant to the practical scenarios of EFL writing education and usually
provided a single holistic score due to the lack of appropriate datasets. In
this paper, we release DREsS, a large-scale, standard dataset for rubric-based
automated essay scoring. DREsS comprises three sub-datasets: DREsS_New,
DREsS_Std., and DREsS_CASE. We collect DREsS_New, a real-classroom dataset with
1.7K essays authored by EFL undergraduate students and scored by English
education experts. We also standardize existing rubric-based essay scoring
datasets as DREsS_Std. We suggest CASE, a corruption-based augmentation
strategy for essays, which generates 20K synthetic samples of DREsS_CASE and
improves the baseline results by 45.44%. DREsS will enable further research to
provide a more accurate and practical AES system for EFL writing education.
Related papers
- "I understand why I got this grade": Automatic Short Answer Grading with Feedback [36.74896284581596]
We present a dataset of 5.8k student answers accompanied by reference answers and questions for the Automatic Short Answer Grading (ASAG) task.
The EngSAF dataset is meticulously curated to cover a diverse range of subjects, questions, and answer patterns from multiple engineering domains.
arXiv Detail & Related papers (2024-06-30T15:42:18Z) - Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression [27.152245569974678]
We develop two models that automatically score English essays across multiple dimensions.
Our systems achieve impressive performance in evaluation using three criteria: precision, F1 score, and Quadratic Weighted Kappa.
arXiv Detail & Related papers (2024-06-03T10:59:50Z) - RECIPE4U: Student-ChatGPT Interaction Dataset in EFL Writing Education [15.253081304714101]
We present RECIPE4U, a dataset sourced from a semester-long experiment with 212 college students in English as Foreign Language (EFL) writing courses.
During the study, students engaged in dialogues with ChatGPT to revise their essays. RECIPE4U includes comprehensive records of these interactions, including conversation logs, students' intent, students' self-rated satisfaction, and students' essay edit histories.
arXiv Detail & Related papers (2024-03-13T05:51:57Z) - Empirical Study of Large Language Models as Automated Essay Scoring
Tools in English Composition__Taking TOEFL Independent Writing Task for
Example [25.220438332156114]
This study aims to assess the capabilities and constraints of ChatGPT, a prominent representative of large language models.
This study employs ChatGPT to conduct an automated evaluation of English essays, even with a small sample size.
arXiv Detail & Related papers (2024-01-07T07:13:50Z) - Improving Text Embeddings with Large Language Models [59.930513259982725]
We introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.
We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages.
Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data.
arXiv Detail & Related papers (2023-12-31T02:13:18Z) - FABRIC: Automated Scoring and Feedback Generation for Essays [41.979996110725324]
We present FABRIC, a pipeline to help students and instructors in English writing classes by automatically generating 1) the overall scores, 2) specific rubric-based scores, and 3) detailed feedback on how to improve the essays.
We evaluate the effectiveness of the new DREsS and the augmentation strategy CASE quantitatively and show significant improvements over the models trained with existing datasets.
arXiv Detail & Related papers (2023-10-08T15:00:04Z) - A Benchmark for Text Expansion: Datasets, Metrics, and Baselines [87.47745669317894]
This work presents a new task of Text Expansion (TE), which aims to insert fine-grained modifier into proper locations of the plain text.
We leverage four complementary approaches to construct a dataset with 12 million automatically generated instances and 2K human-annotated references.
On top of a pre-trained text-infilling model, we build both pipelined and joint Locate&Infill models, which demonstrate the superiority over the Text2Text baselines.
arXiv Detail & Related papers (2023-09-17T07:54:38Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text.
Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.