Improving Performance of Automated Essay Scoring by using
back-translation essays and adjusted scores
- URL: http://arxiv.org/abs/2203.00354v1
- Date: Tue, 1 Mar 2022 11:05:43 GMT
- Title: Improving Performance of Automated Essay Scoring by using
back-translation essays and adjusted scores
- Authors: You-Jin Jong (1), Yong-Jin Kim (2), Ok-Chol Ri (1) ((1) Kum Sung
Middle School Number 2, Pyongyang, D.P.R of Korea, (2) Faculty of
Mathematics, KIM IL SUNG University, Pyongyang, D.P.R of Korea)
- Abstract summary: We propose a method to increase the number of essay-score pairs using back-translation and score adjustment.
We evaluate the effectiveness of the augmented data using models from prior work.
The performance of the models was improved by using augmented data to train the models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated essay scoring plays an important role in judging students' language
abilities in education. Traditional approaches use handcrafted features to
score and are time-consuming and complicated. Recently, neural network
approaches have improved performance without any feature engineering. Unlike
other natural language processing tasks, only a small number of datasets are
publicly available for automated essay scoring, and the size of the dataset is
not sufficiently large. Considering that the performance of a neural network is
closely related to the size of the dataset, the lack of data limits the
performance improvement of the automated essay scoring model. In this paper, we
proposed a method to increase the number of essay-score pairs using
back-translation and score adjustment and applied it to the Automated Student
Assessment Prize dataset for augmentation. We evaluated the effectiveness of
the augmented data using models from prior work. In addition, performance was
evaluated in a model using long short-term memory, which is widely used for
automated essay scoring. The performance of the models was improved by using
augmented data to train the models.
Related papers
- Influence Scores at Scale for Efficient Language Data Sampling [3.072340427031969]
"influence scores" are used to identify important subsets of data.
In this paper, we explore the applicability of influence scores in language classification tasks.
arXiv Detail & Related papers (2023-11-27T20:19:22Z) - Rubric-Specific Approach to Automated Essay Scoring with Augmentation
Training [0.1227734309612871]
We propose a series of data augmentation operations that train and test an automated scoring model to learn features and functions overlooked by previous works.
We achieve state-of-the-art performance in the Automated Student Assessment Prize dataset.
arXiv Detail & Related papers (2023-09-06T05:51:19Z) - Machine Unlearning for Causal Inference [0.6621714555125157]
It is important to enable the model to forget some of its learning/captured information about a given user (machine unlearning)
This paper introduces the concept of machine unlearning for causal inference, particularly propensity score matching and treatment effect estimation.
The dataset used in the study is the Lalonde dataset, a widely used dataset for evaluating the effectiveness of job training programs.
arXiv Detail & Related papers (2023-08-24T17:27:01Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Toward Educator-focused Automated Scoring Systems for Reading and
Writing [0.0]
This paper addresses the challenges of data and label availability, authentic and extended writing, domain scoring, prompt and source variety, and transfer learning.
It employs techniques that preserve essay length as an important feature without increasing model training costs.
arXiv Detail & Related papers (2021-12-22T15:44:30Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Exploring the Efficacy of Automatically Generated Counterfactuals for
Sentiment Analysis [17.811597734603144]
We propose an approach to automatically generating counterfactual data for data augmentation and explanation.
A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance.
arXiv Detail & Related papers (2021-06-29T10:27:01Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Improving Zero and Few-Shot Abstractive Summarization with Intermediate
Fine-tuning and Data Augmentation [101.26235068460551]
Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks.
Models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains.
We introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner.
arXiv Detail & Related papers (2020-10-24T08:36:49Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.