Reducing the Cost: Cross-Prompt Pre-Finetuning for Short Answer Scoring
- URL: http://arxiv.org/abs/2408.13966v1
- Date: Mon, 26 Aug 2024 00:23:56 GMT
- Title: Reducing the Cost: Cross-Prompt Pre-Finetuning for Short Answer Scoring
- Authors: Hiroaki Funayama, Yuya Asazuma, Yuichiroh Matsubayashi, Tomoya Mizumoto, Kentaro Inui,
- Abstract summary: We train a model on existing rubrics and answers with gold score signals and finetune it on a new prompt.
Experiments show that finetuning on existing cross-prompt data with key phrases significantly improves scoring accuracy.
It is crucial to design the model so that it can learn the task's general property.
- Score: 17.1154345762798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated Short Answer Scoring (SAS) is the task of automatically scoring a given input to a prompt based on rubrics and reference answers. Although SAS is useful in real-world applications, both rubrics and reference answers differ between prompts, thus requiring a need to acquire new data and train a model for each new prompt. Such requirements are costly, especially for schools and online courses where resources are limited and only a few prompts are used. In this work, we attempt to reduce this cost through a two-phase approach: train a model on existing rubrics and answers with gold score signals and finetune it on a new prompt. Specifically, given that scoring rubrics and reference answers differ for each prompt, we utilize key phrases, or representative expressions that the answer should contain to increase scores, and train a SAS model to learn the relationship between key phrases and answers using already annotated prompts (i.e., cross-prompts). Our experimental results show that finetuning on existing cross-prompt data with key phrases significantly improves scoring accuracy, especially when the training data is limited. Finally, our extensive analysis shows that it is crucial to design the model so that it can learn the task's general property.
Related papers
- Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback [3.2734777984053887]
We propose a modular retrieval augmented generation based ASAS-F system that scores answers and generates feedback in strict zero-shot and few-shot learning scenarios.
Results show an improvement in scoring accuracy by 9% on unseen questions compared to fine-tuning, offering a scalable and cost-effective solution.
arXiv Detail & Related papers (2024-09-30T07:48:55Z) - Harnessing the Power of Prompt-based Techniques for Generating
School-Level Questions using Large Language Models [0.5459032912385802]
We propose a novel approach that utilizes prompt-based techniques to generate descriptive and reasoning-based questions.
We curate a new QG dataset called EduProbe for school-level subjects, by leveraging the rich content of NCERT textbooks.
We investigate several prompt-based QG methods by fine-tuning transformer-based large language models.
arXiv Detail & Related papers (2023-12-02T05:13:28Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - A Simple Zero-shot Prompt Weighting Technique to Improve Prompt
Ensembling in Text-Image Models [30.128204719490856]
We aim to automate prompt engineering and improve zero-shot accuracy through prompt ensembling.
We identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data.
Using our proposed scoring method to create a weighted average prompt ensemble, our method outperforms equal average ensemble, as well as hand-crafted prompts.
arXiv Detail & Related papers (2023-02-13T10:19:58Z) - Global Constraints with Prompting for Zero-Shot Event Argument
Classification [49.84347224233628]
We propose to use global constraints with prompting to tackle event argument classification without any annotation and task-specific training.
A pre-trained language model scores the new passages, making the initial prediction.
Our novel prompt templates can easily adapt to all events and argument types without manual effort.
arXiv Detail & Related papers (2023-02-09T06:39:29Z) - Efficiently Enhancing Zero-Shot Performance of Instruction Following
Model via Retrieval of Soft Prompt [56.22456716092954]
retrieval of soft prompts can efficiently assist hard prompts in zero-shot task generalization.
We train soft prompt embeddings for each prompt through prompt tuning, store the samples of the training instances mapped with the prompt embeddings, and retrieve the corresponding prompt embedding of the training instance closest to the query instance during inference.
While only adding 0.007% additional parameters, retrieval of soft prompt enhances the performance of T0 on unseen tasks by outperforming it on 10 out of 11 datasets as well as improving the mean accuracy of T0 on BIG-bench benchmark by 2.39% points.
arXiv Detail & Related papers (2022-10-06T16:26:03Z) - How Many Data Points is a Prompt Worth? [106.76346863035786]
Proponents of prompting argue that it provides a method for injecting task-specific guidance.
We compare prompted and head-based fine-tuning in equal conditions across many tasks and data sizes.
Results show that prompting is often worth 100s of data points on average across classification tasks.
arXiv Detail & Related papers (2021-03-15T16:10:23Z) - Get It Scored Using AutoSAS -- An Automated System for Scoring Short
Answers [63.835172924290326]
We present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS)
We propose and explain the design and development of a system for SAS, namely AutoSAS.
AutoSAS shows state-of-the-art performance and achieves better results by over 8% in some of the question prompts.
arXiv Detail & Related papers (2020-12-21T10:47:30Z) - Generating Dialogue Responses from a Semantic Latent Space [75.18449428414736]
We propose an alternative to the end-to-end classification on vocabulary.
We learn the pair relationship between the prompts and responses as a regression task on a latent space.
Human evaluation showed that learning the task on a continuous space can generate responses that are both relevant and informative.
arXiv Detail & Related papers (2020-10-04T19:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.