Automated Scoring for Reading Comprehension via In-context BERT Tuning
- URL: http://arxiv.org/abs/2205.09864v2
- Date: Thu, 15 Jun 2023 04:37:17 GMT
- Title: Automated Scoring for Reading Comprehension via In-context BERT Tuning
- Authors: Nigel Fernandez, Aritra Ghosh, Naiming Liu, Zichao Wang, Beno\^it
Choffin, Richard Baraniuk, Andrew Lan
- Abstract summary: In this paper, we report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension.
Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully-designed input structure.
We demonstrate the effectiveness of our approach via local evaluations using the training dataset provided by the challenge.
- Score: 9.135673900486827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated scoring of open-ended student responses has the potential to
significantly reduce human grader effort. Recent advances in automated scoring
often leverage textual representations based on pre-trained language models
such as BERT and GPT as input to scoring models. Most existing approaches train
a separate model for each item/question, which is suitable for scenarios such
as essay scoring where items can be quite different from one another. However,
these approaches have two limitations: 1) they fail to leverage item linkage
for scenarios such as reading comprehension where multiple items may share a
reading passage; 2) they are not scalable since storing one model per item
becomes difficult when models have a large number of parameters. In this paper,
we report our (grand prize-winning) solution to the National Assessment of
Education Progress (NAEP) automated scoring challenge for reading
comprehension. Our approach, in-context BERT fine-tuning, produces a single
shared scoring model for all items with a carefully-designed input structure to
provide contextual information on each item. We demonstrate the effectiveness
of our approach via local evaluations using the training dataset provided by
the challenge. We also discuss the biases, common error types, and limitations
of our approach.
Related papers
- Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - JPAVE: A Generation and Classification-based Model for Joint Product
Attribute Prediction and Value Extraction [59.94977231327573]
We propose a multi-task learning model with value generation/classification and attribute prediction called JPAVE.
Two variants of our model are designed for open-world and closed-world scenarios.
Experimental results on a public dataset demonstrate the superiority of our model compared with strong baselines.
arXiv Detail & Related papers (2023-11-07T18:36:16Z) - Short Answer Grading Using One-shot Prompting and Text Similarity
Scoring Model [2.14986347364539]
We developed an automated short answer grading model that provided both analytic scores and holistic scores.
The accuracy and quadratic weighted kappa of our model were 0.67 and 0.71 on a subset of the publicly available ASAG dataset.
arXiv Detail & Related papers (2023-05-29T22:05:29Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Arguments to Key Points Mapping with Prompt-based Learning [0.0]
We propose two approaches to the argument-to-keypoint mapping task.
The first approach is to incorporate prompt engineering for fine-tuning the pre-trained language models.
The second approach utilizes prompt-based learning in PLMs to generate intermediary texts.
arXiv Detail & Related papers (2022-11-28T01:48:29Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Automatic Validation of Textual Attribute Values in E-commerce Catalog
by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge.
It can learn transferable knowledge from a subset of categories with limited labeled data.
It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.