Related papers: Improving Performance of Automated Essay Scoring by using back-translation essays and adjusted scores

Improving Performance of Automated Essay Scoring by using back-translation essays and adjusted scores

URL: http://arxiv.org/abs/2203.00354v1
Date: Tue, 1 Mar 2022 11:05:43 GMT
Title: Improving Performance of Automated Essay Scoring by using back-translation essays and adjusted scores
Authors: You-Jin Jong (1), Yong-Jin Kim (2), Ok-Chol Ri (1) ((1) Kum Sung Middle School Number 2, Pyongyang, D.P.R of Korea, (2) Faculty of Mathematics, KIM IL SUNG University, Pyongyang, D.P.R of Korea)
Abstract summary: We propose a method to increase the number of essay-score pairs using back-translation and score adjustment. We evaluate the effectiveness of the augmented data using models from prior work. The performance of the models was improved by using augmented data to train the models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated essay scoring plays an important role in judging students' language abilities in education. Traditional approaches use handcrafted features to score and are time-consuming and complicated. Recently, neural network approaches have improved performance without any feature engineering. Unlike other natural language processing tasks, only a small number of datasets are publicly available for automated essay scoring, and the size of the dataset is not sufficiently large. Considering that the performance of a neural network is closely related to the size of the dataset, the lack of data limits the performance improvement of the automated essay scoring model. In this paper, we proposed a method to increase the number of essay-score pairs using back-translation and score adjustment and applied it to the Automated Student Assessment Prize dataset for augmentation. We evaluated the effectiveness of the augmented data using models from prior work. In addition, performance was evaluated in a model using long short-term memory, which is widely used for automated essay scoring. The performance of the models was improved by using augmented data to train the models.

Related papers

Rank-Then-Score: Enhancing Large Language Models for Automated Essay Scoring [6.459215652021233]
We propose Rank-Then-Score (RTS), a fine-tuning framework based on large language models to enhance their essay scoring capabilities. Experimental results on two benchmark datasets, HSK and ASAP, demonstrate that RTS consistently outperforms the direct prompting (Vanilla) method in terms of average QWK.
arXiv Detail & Related papers (2025-04-08T07:10:51Z)
Influence Scores at Scale for Efficient Language Data Sampling [3.072340427031969]
"influence scores" are used to identify important subsets of data. In this paper, we explore the applicability of influence scores in language classification tasks.
arXiv Detail & Related papers (2023-11-27T20:19:22Z)
Rubric-Specific Approach to Automated Essay Scoring with Augmentation Training [0.1227734309612871]
We propose a series of data augmentation operations that train and test an automated scoring model to learn features and functions overlooked by previous works. We achieve state-of-the-art performance in the Automated Student Assessment Prize dataset.
arXiv Detail & Related papers (2023-09-06T05:51:19Z)
Machine Unlearning for Causal Inference [0.6621714555125157]
It is important to enable the model to forget some of its learning/captured information about a given user (machine unlearning) This paper introduces the concept of machine unlearning for causal inference, particularly propensity score matching and treatment effect estimation. The dataset used in the study is the Lalonde dataset, a widely used dataset for evaluating the effectiveness of job training programs.
arXiv Detail & Related papers (2023-08-24T17:27:01Z)
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints. This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks. Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z)
Toward Educator-focused Automated Scoring Systems for Reading and Writing [0.0]
This paper addresses the challenges of data and label availability, authentic and extended writing, domain scoring, prompt and source variety, and transfer learning. It employs techniques that preserve essay length as an important feature without increasing model training costs.
arXiv Detail & Related papers (2021-12-22T15:44:30Z)
Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods. We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments. The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis [17.811597734603144]
We propose an approach to automatically generating counterfactual data for data augmentation and explanation. A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance.
arXiv Detail & Related papers (2021-06-29T10:27:01Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation [101.26235068460551]
Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks. Models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. We introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner.
arXiv Detail & Related papers (2020-10-24T08:36:49Z)
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics. We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.