Get It Scored Using AutoSAS -- An Automated System for Scoring Short
Answers
- URL: http://arxiv.org/abs/2012.11243v1
- Date: Mon, 21 Dec 2020 10:47:30 GMT
- Title: Get It Scored Using AutoSAS -- An Automated System for Scoring Short
Answers
- Authors: Yaman Kumar, Swati Aggarwal, Debanjan Mahata, Rajiv Ratn Shah,
Ponnurangam Kumaraguru, Roger Zimmermann
- Abstract summary: We present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS)
We propose and explain the design and development of a system for SAS, namely AutoSAS.
AutoSAS shows state-of-the-art performance and achieves better results by over 8% in some of the question prompts.
- Score: 63.835172924290326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of MOOCs, online exams are taken by millions of candidates, where
scoring short answers is an integral part. It becomes intractable to evaluate
them by human graders. Thus, a generic automated system capable of grading
these responses should be designed and deployed. In this paper, we present a
fast, scalable, and accurate approach towards automated Short Answer Scoring
(SAS). We propose and explain the design and development of a system for SAS,
namely AutoSAS. Given a question along with its graded samples, AutoSAS can
learn to grade that prompt successfully. This paper further lays down the
features such as lexical diversity, Word2Vec, prompt, and content overlap that
plays a pivotal role in building our proposed model. We also present a
methodology for indicating the factors responsible for scoring an answer. The
trained model is evaluated on an extensively used public dataset, namely
Automated Student Assessment Prize Short Answer Scoring (ASAP-SAS). AutoSAS
shows state-of-the-art performance and achieves better results by over 8% in
some of the question prompts as measured by Quadratic Weighted Kappa (QWK),
showing performance comparable to humans.
Related papers
- Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - AutoSurvey: Large Language Models Can Automatically Write Surveys [77.0458309675818]
This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys.
Traditional survey paper creation faces challenges due to the vast volume and complexity of information.
Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.
arXiv Detail & Related papers (2024-06-10T12:56:06Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation [9.390902237835457]
We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG)
Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions.
arXiv Detail & Related papers (2024-05-22T13:14:11Z) - Using Sampling to Estimate and Improve Performance of Automated Scoring
Systems with Guarantees [63.62448343531963]
We propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently.
We observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget.
arXiv Detail & Related papers (2021-11-17T05:00:51Z) - Text similarity analysis for evaluation of descriptive answers [0.0]
This paper proposes a text analysis based automated approach for automatic evaluation of the descriptive answers in an examination.
In this architecture, the examiner creates a sample answer sheet for given sets of question.
By using the concept of text summarization, text semantics and keywords summarization, the final score for each answer is calculated.
arXiv Detail & Related papers (2021-05-06T20:19:58Z) - Stacking Neural Network Models for Automatic Short Answer Scoring [0.0]
We propose the use of a stacking model based on neural network and XGBoost for classification process with sentence embedding feature.
Best model obtained an F1-score of 0.821 exceeding the previous work at the same dataset.
arXiv Detail & Related papers (2020-10-21T16:00:09Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.