Autoregressive Score Generation for Multi-trait Essay Scoring
- URL: http://arxiv.org/abs/2403.08332v1
- Date: Wed, 13 Mar 2024 08:34:53 GMT
- Title: Autoregressive Score Generation for Multi-trait Essay Scoring
- Authors: Heejin Do, Yunsu Kim, Gary Geunbae Lee
- Abstract summary: We propose an autoregressive prediction of multi-trait scores (ArTS) in automated essay scoring (AES)
Unlike prior regression or classification methods, we redefine AES as a score-generation task, allowing a single model to predict multiple scores.
Experimental results proved the efficacy of ArTS, showing over 5% average improvements in both prompts and traits.
- Score: 8.531986117865946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, encoder-only pre-trained models such as BERT have been successfully
applied in automated essay scoring (AES) to predict a single overall score.
However, studies have yet to explore these models in multi-trait AES, possibly
due to the inefficiency of replicating BERT-based models for each trait.
Breaking away from the existing sole use of encoder, we propose an
autoregressive prediction of multi-trait scores (ArTS), incorporating a
decoding process by leveraging the pre-trained T5. Unlike prior regression or
classification methods, we redefine AES as a score-generation task, allowing a
single model to predict multiple scores. During decoding, the subsequent trait
prediction can benefit by conditioning on the preceding trait scores.
Experimental results proved the efficacy of ArTS, showing over 5% average
improvements in both prompts and traits.
Related papers
- Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards [5.632624116225276]
We propose scoring-aware Multi-reward Reinforcement Learning (SaMRL)
SaMRL integrates actual evaluation schemes into the training process by designing QWK-based rewards with a mean-squared error penalty for multi-trait AES.
arXiv Detail & Related papers (2024-09-26T02:16:48Z) - RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoring [0.0]
Reasoning Distillation-Based Evaluation (RDBE) integrates interpretability to elucidate the rationale behind model scores.
Our experimental results demonstrate the efficacy of RDBE across all scoring rubrics considered in the dataset.
arXiv Detail & Related papers (2024-07-03T05:49:01Z) - Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction [57.16121098944589]
RDA is a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders.
It is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives.
For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs.
arXiv Detail & Related papers (2023-12-01T15:03:29Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - The MultiBERTs: BERT Reproductions for Robustness Analysis [86.29162676103385]
Re-running pretraining can lead to substantially different conclusions about performance.
We introduce MultiBERTs: a set of 25 BERT-base checkpoints.
The aim is to enable researchers to draw robust and statistically justified conclusions about pretraining procedures.
arXiv Detail & Related papers (2021-06-30T15:56:44Z) - Knowledge Transfer by Discriminative Pre-training for Academic
Performance Prediction [5.3431413737671525]
We propose DPA, a transfer learning framework with Discriminative Pre-training tasks for Academic performance prediction.
Compared to the previous state-of-the-art generative pre-training method, DPA is more sample efficient, leading to fast convergence to lower academic performance prediction error.
arXiv Detail & Related papers (2021-06-28T13:02:23Z) - UmBERTo-MTSA @ AcCompl-It: Improving Complexity and Acceptability
Prediction with Multi-task Learning on Self-Supervised Annotations [0.0]
This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available.
Nerve language models are fine-tuned using this procedure in the context of the AcCompl-it shared task at EVALITA 2020.
arXiv Detail & Related papers (2020-11-10T15:50:37Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.