Automated Essay Scoring Using Transformer Models
- URL: http://arxiv.org/abs/2110.06874v1
- Date: Wed, 13 Oct 2021 17:09:47 GMT
- Title: Automated Essay Scoring Using Transformer Models
- Authors: Sabrina Ludwig, Christian Mayer, Christopher Hansen, Kerstin Eilers,
and Steffen Brandt
- Abstract summary: We consider a transformer-based approach for automated essay scoring (AES)
We compare its performance to a logistic regression model based on the BOW approach and discuss their differences.
We show how such models can help increase the accuracy of human raters.
- Score: 0.415623340386296
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Automated essay scoring (AES) is gaining increasing attention in the
education sector as it significantly reduces the burden of manual scoring and
allows ad hoc feedback for learners. Natural language processing based on
machine learning has been shown to be particularly suitable for text
classification and AES. While many machine-learning approaches for AES still
rely on a bag-of-words (BOW) approach, we consider a transformer-based approach
in this paper, compare its performance to a logistic regression model based on
the BOW approach and discuss their differences. The analysis is based on 2,088
email responses to a problem-solving task, that were manually labeled in terms
of politeness. Both transformer models considered in that analysis outperformed
without any hyper-parameter tuning the regression-based model. We argue that
for AES tasks such as politeness classification, the transformer-based approach
has significant advantages, while a BOW approach suffers from not taking word
order into account and reducing the words to their stem. Further, we show how
such models can help increase the accuracy of human raters, and we provide a
detailed instruction on how to implement transformer-based models for one's own
purpose.
Related papers
- QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Extensive Evaluation of Transformer-based Architectures for Adverse Drug
Events Extraction [6.78974856327994]
Adverse Event (ADE) extraction is one of the core tasks in digital pharmacovigilance.
We evaluate 19 Transformer-based models for ADE extraction on informal texts.
At the end of our analyses, we identify a list of take-home messages that can be derived from the experimental data.
arXiv Detail & Related papers (2023-06-08T15:25:24Z) - Zero-Shot Automatic Pronunciation Assessment [19.971348810774046]
We propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT.
Experimental results on speechocean762 demonstrate that the proposed method achieves comparable performance to supervised regression baselines.
arXiv Detail & Related papers (2023-05-31T05:17:17Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Learning to Perturb Word Embeddings for Out-of-distribution QA [55.103586220757464]
We propose a simple yet effective DA method based on a noise generator, which learns to perturb the word embedding of the input questions and context without changing their semantics.
We validate the performance of the QA models trained with our word embedding on a single source dataset, on five different target domains.
Notably, the model trained with ours outperforms the model trained with more than 240K artificially generated QA pairs.
arXiv Detail & Related papers (2021-05-06T14:12:26Z) - Non-autoregressive Transformer-based End-to-end ASR using BERT [13.07939371864781]
This paper presents a transformer-based end-to-end automatic speech recognition (ASR) model based on BERT.
A series of experiments conducted on the AISHELL-1 dataset demonstrates competitive or superior results.
arXiv Detail & Related papers (2021-04-10T16:22:17Z) - Automated essay scoring using efficient transformer-based language
models [0.5161531917413708]
Automated Essay Scoring (AES) is a cross-disciplinary effort involving Education, Linguistics, and Natural Language Processing (NLP)
Large pretrained transformer-based language models have dominated the current state-of-the-art in many NLP tasks.
This paper is to challenge the paradigm in NLP that bigger is better when it comes to AES.
arXiv Detail & Related papers (2021-02-25T19:28:39Z) - On Learning Text Style Transfer with Direct Rewards [101.97136885111037]
Lack of parallel corpora makes it impossible to directly train supervised models for the text style transfer task.
We leverage semantic similarity metrics originally used for fine-tuning neural machine translation models.
Our model provides significant gains in both automatic and human evaluation over strong baselines.
arXiv Detail & Related papers (2020-10-24T04:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.