Get It Scored Using AutoSAS -- An Automated System for Scoring Short
Answers
- URL: http://arxiv.org/abs/2012.11243v1
- Date: Mon, 21 Dec 2020 10:47:30 GMT
- Title: Get It Scored Using AutoSAS -- An Automated System for Scoring Short
Answers
- Authors: Yaman Kumar, Swati Aggarwal, Debanjan Mahata, Rajiv Ratn Shah,
Ponnurangam Kumaraguru, Roger Zimmermann
- Abstract summary: We present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS)
We propose and explain the design and development of a system for SAS, namely AutoSAS.
AutoSAS shows state-of-the-art performance and achieves better results by over 8% in some of the question prompts.
- Score: 63.835172924290326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of MOOCs, online exams are taken by millions of candidates, where
scoring short answers is an integral part. It becomes intractable to evaluate
them by human graders. Thus, a generic automated system capable of grading
these responses should be designed and deployed. In this paper, we present a
fast, scalable, and accurate approach towards automated Short Answer Scoring
(SAS). We propose and explain the design and development of a system for SAS,
namely AutoSAS. Given a question along with its graded samples, AutoSAS can
learn to grade that prompt successfully. This paper further lays down the
features such as lexical diversity, Word2Vec, prompt, and content overlap that
plays a pivotal role in building our proposed model. We also present a
methodology for indicating the factors responsible for scoring an answer. The
trained model is evaluated on an extensively used public dataset, namely
Automated Student Assessment Prize Short Answer Scoring (ASAP-SAS). AutoSAS
shows state-of-the-art performance and achieves better results by over 8% in
some of the question prompts as measured by Quadratic Weighted Kappa (QWK),
showing performance comparable to humans.
Related papers
- Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback [3.2734777984053887]
We propose a modular retrieval augmented generation based ASAS-F system that scores answers and generates feedback in strict zero-shot and few-shot learning scenarios.
Results show an improvement in scoring accuracy by 9% on unseen questions compared to fine-tuning, offering a scalable and cost-effective solution.
arXiv Detail & Related papers (2024-09-30T07:48:55Z) - ASAG2024: A Combined Benchmark for Short Answer Grading [0.10826342457160269]
Short Answer Grading (SAG) systems aim to automatically score students' answers.
There exists no comprehensive short-answer grading benchmark across different subjects, grading scales, and distributions.
We introduce the combined ASAG2024 benchmark to facilitate the comparison of automated grading systems.
arXiv Detail & Related papers (2024-09-27T09:56:02Z) - Reducing the Cost: Cross-Prompt Pre-Finetuning for Short Answer Scoring [17.1154345762798]
We train a model on existing rubrics and answers with gold score signals and finetune it on a new prompt.
Experiments show that finetuning on existing cross-prompt data with key phrases significantly improves scoring accuracy.
It is crucial to design the model so that it can learn the task's general property.
arXiv Detail & Related papers (2024-08-26T00:23:56Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - AutoSurvey: Large Language Models Can Automatically Write Surveys [77.0458309675818]
This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys.
Traditional survey paper creation faces challenges due to the vast volume and complexity of information.
Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.
arXiv Detail & Related papers (2024-06-10T12:56:06Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation [9.390902237835457]
We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG)
Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions.
arXiv Detail & Related papers (2024-05-22T13:14:11Z) - Using Sampling to Estimate and Improve Performance of Automated Scoring
Systems with Guarantees [63.62448343531963]
We propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently.
We observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget.
arXiv Detail & Related papers (2021-11-17T05:00:51Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.