Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring
- URL: http://arxiv.org/abs/2305.16826v1
- Date: Fri, 26 May 2023 11:11:19 GMT
- Title: Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring
- Authors: Heejin Do, Yunsu Kim, Gary Geunbae Lee
- Abstract summary: Automated essay scoring (AES) aims to score essays written for a given prompt, which defines the writing topic.
Most existing AES systems assume to grade essays of the same prompt as used in training and assign only a holistic score.
We propose a robust model: prompt- and trait relation-aware cross-prompt essay trait scorer.
- Score: 3.6825890616838066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated essay scoring (AES) aims to score essays written for a given
prompt, which defines the writing topic. Most existing AES systems assume to
grade essays of the same prompt as used in training and assign only a holistic
score. However, such settings conflict with real-education situations;
pre-graded essays for a particular prompt are lacking, and detailed trait
scores of sub-rubrics are required. Thus, predicting various trait scores of
unseen-prompt essays (called cross-prompt essay trait scoring) is a remaining
challenge of AES. In this paper, we propose a robust model: prompt- and trait
relation-aware cross-prompt essay trait scorer. We encode prompt-aware essay
representation by essay-prompt attention and utilizing the topic-coherence
feature extracted by the topic-modeling mechanism without access to labeled
data; therefore, our model considers the prompt adherence of an essay, even in
a cross-prompt setting. To facilitate multi-trait scoring, we design
trait-similarity loss that encapsulates the correlations of traits. Experiments
prove the efficacy of our model, showing state-of-the-art results for all
prompts and traits. Significant improvements in low-resource-prompt and
inferior traits further indicate our model's strength.
Related papers
- Hey AI Can You Grade My Essay?: Automatic Essay Grading [1.03590082373586]
We introduce a new model that outperforms the state-of-the-art models in the field of automatic essay grading (AEG)
We have used the concept of collaborative and transfer learning, where one network will be responsible for checking the grammatical and structural features of the sentences of an essay while another network is responsible for scoring the overall idea present in the essay.
Our proposed model has shown the highest accuracy of 85.50%.
arXiv Detail & Related papers (2024-10-12T01:17:55Z) - Graded Relevance Scoring of Written Essays with Dense Retrieval [4.021352247826289]
We propose a novel approach for graded relevance scoring of written essays that employs dense retrieval encoders.
We leverage Contriever, which is pre-trained with contrastive learning and demonstrated comparable performance to supervised dense retrieval models.
Our method establishes a new state-of-the-art performance in the task-specific scenario, while its extension for the cross-task scenario exhibited a performance that is on par with the state-of-the-art model for that scenario.
arXiv Detail & Related papers (2024-05-08T16:37:58Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - AI, write an essay for me: A large-scale comparison of human-written
versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users.
This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Many Hands Make Light Work: Using Essay Traits to Automatically Score
Essays [41.851075178681015]
We describe a way to score essays holistically using a multi-task learning (MTL) approach.
We compare our results with a single-task learning (STL) approach, using both LSTMs and BiLSTMs.
We find that MTL-based BiLSTM system gives the best results for scoring the essay holistically, as well as performing well on scoring the essay traits.
arXiv Detail & Related papers (2021-02-01T11:31:09Z) - Automated Topical Component Extraction Using Neural Network Attention
Scores from Source-based Essay Scoring [15.234595490118542]
This paper presents a method for linking automated essay scoring (AES) and automated writing evaluation (AWE)
We evaluate performance using a feature-based AES requiring Topical Components (TCs)
Results show that performance is comparable whether using automatically or manually constructed TCs for 1) representing essays as rubric-based features, 2) grading essays.
arXiv Detail & Related papers (2020-08-04T20:13:51Z) - Prompt Agnostic Essay Scorer: A Domain Generalization Approach to
Cross-prompt Automated Essay Scoring [61.21967763569547]
Cross-prompt automated essay scoring (AES) requires the system to use non target-prompt essays to award scores to a target-prompt essay.
This paper introduces Prompt Agnostic Essay Scorer (PAES) for cross-prompt AES.
Our method requires no access to labelled or unlabelled target-prompt data during training and is a single-stage approach.
arXiv Detail & Related papers (2020-08-04T10:17:38Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.