Assessing instructor-AI cooperation for grading essay-type questions in an introductory sociology course
- URL: http://arxiv.org/abs/2501.06461v1
- Date: Sat, 11 Jan 2025 07:18:12 GMT
- Title: Assessing instructor-AI cooperation for grading essay-type questions in an introductory sociology course
- Authors: Francisco Olivos, Tobias Kamelski, Sebastián Ascui-Gac,
- Abstract summary: We evaluate generative pre-trained transformers (GPT) models' performance in transcribing and scoring students' responses.
For grading, GPT demonstrated strong correlations with the human grader scores, especially when template answers were provided.
This study contributes to the growing literature on AI in education, demonstrating its potential to enhance fairness and efficiency in grading essay-type questions.
- Score: 0.0
- License:
- Abstract: This study explores the use of artificial intelligence (AI) as a complementary tool for grading essay-type questions in higher education, focusing on its consistency with human grading and potential to reduce biases. Using 70 handwritten exams from an introductory sociology course, we evaluated generative pre-trained transformers (GPT) models' performance in transcribing and scoring students' responses. GPT models were tested under various settings for both transcription and grading tasks. Results show high similarity between human and GPT transcriptions, with GPT-4o-mini outperforming GPT-4o in accuracy. For grading, GPT demonstrated strong correlations with the human grader scores, especially when template answers were provided. However, discrepancies remained, highlighting GPT's role as a "second grader" to flag inconsistencies for assessment reviewing rather than fully replace human evaluation. This study contributes to the growing literature on AI in education, demonstrating its potential to enhance fairness and efficiency in grading essay-type questions.
Related papers
- Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams.
Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z) - Beyond human subjectivity and error: a novel AI grading system [67.410870290301]
The grading of open-ended questions is a high-effort, high-impact task in education.
Recent breakthroughs in AI technology might facilitate such automation, but this has not been demonstrated at scale.
We introduce a novel automatic short answer grading (ASAG) system.
arXiv Detail & Related papers (2024-05-07T13:49:59Z) - Gemini Pro Defeated by GPT-4V: Evidence from Education [1.0226894006814744]
GPT-4V significantly outperforms Gemini Pro in terms of scoring accuracy and Quadratic Weighted Kappa.
Findings suggest GPT-4V's superior capability in handling complex educational tasks.
arXiv Detail & Related papers (2023-12-27T02:56:41Z) - CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation [87.44350003888646]
Eval-Instruct can acquire pointwise grading critiques with pseudo references and revise these critiques via multi-path prompting.
CritiqueLLM is empirically shown to outperform ChatGPT and all the open-source baselines.
arXiv Detail & Related papers (2023-11-30T16:52:42Z) - Is GPT a Computational Model of Emotion? Detailed Analysis [2.0001091112545066]
This paper investigates the emotional reasoning abilities of the GPT family of large language models via a component perspective.
It shows that GPT's predictions align significantly with human-provided appraisals and emotional labels.
However, GPT faces difficulties predicting emotion intensity and coping responses.
arXiv Detail & Related papers (2023-07-25T19:34:44Z) - Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise
Given to Students in Synthetic Dialogues [2.3361634876233817]
Large language models, such as the AI-chatbot ChatGPT, hold potential for offering constructive feedback to tutors in practical settings.
The accuracy of AI-generated feedback remains uncertain, with scant research investigating the ability of models like ChatGPT to deliver effective feedback.
arXiv Detail & Related papers (2023-07-05T04:14:01Z) - GPT4 is Slightly Helpful for Peer-Review Assistance: A Pilot Study [0.0]
GPT4 was developed to assist in the peer-review process.
By comparing reviews generated by both human reviewers and GPT models for academic papers submitted to a major machine learning conference, we provide initial evidence that artificial intelligence can contribute effectively to the peer-review process.
arXiv Detail & Related papers (2023-06-16T23:11:06Z) - INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained
Feedback [80.57617091714448]
We present InstructScore, an explainable evaluation metric for text generation.
We fine-tune a text evaluation metric based on LLaMA, producing a score for generated text and a human readable diagnostic report.
arXiv Detail & Related papers (2023-05-23T17:27:22Z) - Collaborative Generative AI: Integrating GPT-k for Efficient Editing in
Text-to-Image Generation [114.80518907146792]
We investigate the potential of utilizing large-scale language models, such as GPT-k, to improve the prompt editing process for text-to-image generation.
We compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.
arXiv Detail & Related papers (2023-05-18T21:53:58Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.