Related papers: Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring

Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring

URL: http://arxiv.org/abs/2305.16826v1
Date: Fri, 26 May 2023 11:11:19 GMT
Title: Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring
Authors: Heejin Do, Yunsu Kim, Gary Geunbae Lee
Abstract summary: Automated essay scoring (AES) aims to score essays written for a given prompt, which defines the writing topic. Most existing AES systems assume to grade essays of the same prompt as used in training and assign only a holistic score. We propose a robust model: prompt- and trait relation-aware cross-prompt essay trait scorer.
Score: 3.6825890616838066
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated essay scoring (AES) aims to score essays written for a given prompt, which defines the writing topic. Most existing AES systems assume to grade essays of the same prompt as used in training and assign only a holistic score. However, such settings conflict with real-education situations; pre-graded essays for a particular prompt are lacking, and detailed trait scores of sub-rubrics are required. Thus, predicting various trait scores of unseen-prompt essays (called cross-prompt essay trait scoring) is a remaining challenge of AES. In this paper, we propose a robust model: prompt- and trait relation-aware cross-prompt essay trait scorer. We encode prompt-aware essay representation by essay-prompt attention and utilizing the topic-coherence feature extracted by the topic-modeling mechanism without access to labeled data; therefore, our model considers the prompt adherence of an essay, even in a cross-prompt setting. To facilitate multi-trait scoring, we design trait-similarity loss that encapsulates the correlations of traits. Experiments prove the efficacy of our model, showing state-of-the-art results for all prompts and traits. Significant improvements in low-resource-prompt and inferior traits further indicate our model's strength.

Related papers

Towards Prompt Generalization: Grammar-aware Cross-Prompt Automated Essay Scoring [5.906031288935515]
We propose a grammar-aware cross-prompt trait scoring (GAPS) to learn generic essay representation. We acquire grammatical error-corrected information in essays via the grammar error correction technique and design the AES model to seamlessly integrate such information. GAPS achieves notable QWK gains in the most challenging cross-prompt scenario, highlighting its strength in evaluating unseen prompts.
arXiv Detail & Related papers (2025-02-12T14:41:20Z)
Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs [2.324913904215885]
This paper introduces Rationale-based Multiple Trait Scoring (RMTS), a novel approach for multi-trait essay scoring. RMTS integrates prompt-engineering-based large language models (LLMs) with a fine-tuning-based essay scoring model using a smaller large language model (S-LLM) Experiments on benchmark datasets, including ASAP, ASAP++, and Feedback Prize, show that RMTS significantly outperforms state-of-the-art models and vanilla S-LLMs in trait-specific scoring.
arXiv Detail & Related papers (2024-10-18T06:35:17Z)
Hey AI Can You Grade My Essay?: Automatic Essay Grading [1.03590082373586]
We introduce a new model that outperforms the state-of-the-art models in the field of automatic essay grading (AEG) We have used the concept of collaborative and transfer learning, where one network will be responsible for checking the grammatical and structural features of the sentences of an essay while another network is responsible for scoring the overall idea present in the essay. Our proposed model has shown the highest accuracy of 85.50%.
arXiv Detail & Related papers (2024-10-12T01:17:55Z)
Graded Relevance Scoring of Written Essays with Dense Retrieval [4.021352247826289]
We propose a novel approach for graded relevance scoring of written essays that employs dense retrieval encoders. We leverage Contriever, which is pre-trained with contrastive learning and demonstrated comparable performance to supervised dense retrieval models. Our method establishes a new state-of-the-art performance in the task-specific scenario, while its extension for the cross-task scenario exhibited a performance that is on par with the state-of-the-art model for that scenario.
arXiv Detail & Related papers (2024-05-08T16:37:58Z)
MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task. We conduct experiments on three widely used text classification datasets across four few-shot settings. Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z)
AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users. This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z)
Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
Many Hands Make Light Work: Using Essay Traits to Automatically Score Essays [41.851075178681015]
We describe a way to score essays holistically using a multi-task learning (MTL) approach. We compare our results with a single-task learning (STL) approach, using both LSTMs and BiLSTMs. We find that MTL-based BiLSTM system gives the best results for scoring the essay holistically, as well as performing well on scoring the essay traits.
arXiv Detail & Related papers (2021-02-01T11:31:09Z)
Automated Topical Component Extraction Using Neural Network Attention Scores from Source-based Essay Scoring [15.234595490118542]
This paper presents a method for linking automated essay scoring (AES) and automated writing evaluation (AWE) We evaluate performance using a feature-based AES requiring Topical Components (TCs) Results show that performance is comparable whether using automatically or manually constructed TCs for 1) representing essays as rubric-based features, 2) grading essays.
arXiv Detail & Related papers (2020-08-04T20:13:51Z)
Prompt Agnostic Essay Scorer: A Domain Generalization Approach to Cross-prompt Automated Essay Scoring [61.21967763569547]
Cross-prompt automated essay scoring (AES) requires the system to use non target-prompt essays to award scores to a target-prompt essay. This paper introduces Prompt Agnostic Essay Scorer (PAES) for cross-prompt AES. Our method requires no access to labelled or unlabelled target-prompt data during training and is a single-stage approach.
arXiv Detail & Related papers (2020-08-04T10:17:38Z)
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics. We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.