Using Active Learning Methods to Strategically Select Essays for
Automated Scoring
- URL: http://arxiv.org/abs/2301.00628v2
- Date: Thu, 13 Apr 2023 23:17:58 GMT
- Title: Using Active Learning Methods to Strategically Select Essays for
Automated Scoring
- Authors: Tahereh Firoozi, Hamid Mohammadi, Mark J. Gierl
- Abstract summary: The purpose of this study is to describe and evaluate three active learning methods.
The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method.
All three methods produced strong results, with the topological-based method producing the most efficient classification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Research on automated essay scoring has become increasing important because
it serves as a method for evaluating students' written-responses at scale.
Scalable methods for scoring written responses are needed as students migrate
to online learning environments resulting in the need to evaluate large numbers
of written-response assessments. The purpose of this study is to describe and
evaluate three active learning methods than can be used to minimize the number
of essays that must be scored by human raters while still providing the data
needed to train a modern automated essay scoring system. The three active
learning methods are the uncertainty-based, the topological-based, and the
hybrid method. These three methods were used to select essays included as part
of the Automated Student Assessment Prize competition that were then classified
using a scoring model that was training with the bidirectional encoder
representations from transformer language model. All three active learning
methods produced strong results, with the topological-based method producing
the most efficient classification. Growth rate accuracy was also evaluated. The
active learning methods produced different levels of efficiency under different
sample size allocations but, overall, all three methods were highly efficient
and produced classifications that were similar to one another.
Related papers
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights [0.412484724941528]
We introduce a simple yet effective knowledge distillation method to improve the performance of small language models.
Our approach utilizes a teacher model with approximately 3 billion parameters to identify the most influential tokens in its decision-making process.
This method has proven to be effective, as demonstrated by testing it on four diverse datasets.
arXiv Detail & Related papers (2024-09-19T09:09:53Z) - Active Transfer Learning for Efficient Video-Specific Human Pose
Estimation [16.415080031134366]
Human Pose (HP) estimation is actively researched because of its wide range of applications.
We present our approach combining Active Learning (AL) and Transfer Learning (TL) to adapt HP estimators to individual video domains efficiently.
arXiv Detail & Related papers (2023-11-08T21:56:29Z) - Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data.
We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z) - Saliency Cards: A Framework to Characterize and Compare Saliency Methods [34.38335172204263]
Saliency methods calculate how important each input feature is to a model's output.
Existing approaches assume universal desiderata for saliency methods that do not account for diverse user needs.
We introduce saliency cards: structured documentation of how saliency methods operate and their performance.
arXiv Detail & Related papers (2022-06-07T01:21:49Z) - Partner-Assisted Learning for Few-Shot Image Classification [54.66864961784989]
Few-shot Learning has been studied to mimic human visual capabilities and learn effective models without the need of exhaustive human annotation.
In this paper, we focus on the design of training strategy to obtain an elemental representation such that the prototype of each novel class can be estimated from a few labeled samples.
We propose a two-stage training scheme, which first trains a partner encoder to model pair-wise similarities and extract features serving as soft-anchors, and then trains a main encoder by aligning its outputs with soft-anchors while attempting to maximize classification performance.
arXiv Detail & Related papers (2021-09-15T22:46:19Z) - Automating Document Classification with Distant Supervision to Increase
the Efficiency of Systematic Reviews [18.33687903724145]
Well-done systematic reviews are expensive, time-demanding, and labor-intensive.
We propose an automatic document classification approach to significantly reduce the effort in reviewing documents.
arXiv Detail & Related papers (2020-12-09T22:45:40Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - The World is Not Binary: Learning to Rank with Grayscale Data for
Dialogue Response Selection [55.390442067381755]
We show that grayscale data can be automatically constructed without human effort.
Our method employs off-the-shelf response retrieval models and response generation models as automatic grayscale data generators.
Experiments on three benchmark datasets and four state-of-the-art matching models show that the proposed approach brings significant and consistent performance improvements.
arXiv Detail & Related papers (2020-04-06T06:34:54Z) - PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative
Dialogue Systems [48.99561874529323]
There are three kinds of automatic methods to evaluate the open-domain generative dialogue systems.
Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective.
We propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments.
arXiv Detail & Related papers (2020-04-06T04:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.