Hey AI Can You Grade My Essay?: Automatic Essay Grading
- URL: http://arxiv.org/abs/2410.09319v1
- Date: Sat, 12 Oct 2024 01:17:55 GMT
- Title: Hey AI Can You Grade My Essay?: Automatic Essay Grading
- Authors: Maisha Maliha, Vishal Pramanik,
- Abstract summary: We introduce a new model that outperforms the state-of-the-art models in the field of automatic essay grading (AEG)
We have used the concept of collaborative and transfer learning, where one network will be responsible for checking the grammatical and structural features of the sentences of an essay while another network is responsible for scoring the overall idea present in the essay.
Our proposed model has shown the highest accuracy of 85.50%.
- Score: 1.03590082373586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic essay grading (AEG) has attracted the the attention of the NLP community because of its applications to several educational applications, such as scoring essays, short answers, etc. AEG systems can save significant time and money when grading essays. In the existing works, the essays are graded where a single network is responsible for the whole process, which may be ineffective because a single network may not be able to learn all the features of a human-written essay. In this work, we have introduced a new model that outperforms the state-of-the-art models in the field of AEG. We have used the concept of collaborative and transfer learning, where one network will be responsible for checking the grammatical and structural features of the sentences of an essay while another network is responsible for scoring the overall idea present in the essay. These learnings are transferred to another network to score the essay. We also compared the performances of the different models mentioned in our work, and our proposed model has shown the highest accuracy of 85.50%.
Related papers
- Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression [27.152245569974678]
We develop two models that automatically score English essays across multiple dimensions.
Our systems achieve impressive performance in evaluation using three criteria: precision, F1 score, and Quadratic Weighted Kappa.
arXiv Detail & Related papers (2024-06-03T10:59:50Z) - Graded Relevance Scoring of Written Essays with Dense Retrieval [4.021352247826289]
We propose a novel approach for graded relevance scoring of written essays that employs dense retrieval encoders.
We leverage Contriever, which is pre-trained with contrastive learning and demonstrated comparable performance to supervised dense retrieval models.
Our method establishes a new state-of-the-art performance in the task-specific scenario, while its extension for the cross-task scenario exhibited a performance that is on par with the state-of-the-art model for that scenario.
arXiv Detail & Related papers (2024-05-08T16:37:58Z) - Enhancing Argument Structure Extraction with Efficient Leverage of
Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information.
We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information.
Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z) - Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring [3.6825890616838066]
Automated essay scoring (AES) aims to score essays written for a given prompt, which defines the writing topic.
Most existing AES systems assume to grade essays of the same prompt as used in training and assign only a holistic score.
We propose a robust model: prompt- and trait relation-aware cross-prompt essay trait scorer.
arXiv Detail & Related papers (2023-05-26T11:11:19Z) - AI, write an essay for me: A large-scale comparison of human-written
versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users.
This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Many Hands Make Light Work: Using Essay Traits to Automatically Score
Essays [41.851075178681015]
We describe a way to score essays holistically using a multi-task learning (MTL) approach.
We compare our results with a single-task learning (STL) approach, using both LSTMs and BiLSTMs.
We find that MTL-based BiLSTM system gives the best results for scoring the essay holistically, as well as performing well on scoring the essay traits.
arXiv Detail & Related papers (2021-02-01T11:31:09Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.