FABRIC: Automated Scoring and Feedback Generation for Essays
- URL: http://arxiv.org/abs/2310.05191v1
- Date: Sun, 8 Oct 2023 15:00:04 GMT
- Title: FABRIC: Automated Scoring and Feedback Generation for Essays
- Authors: Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Hyunseung Lim, Yoonsu
Kim, Tak Yeon Lee, Hwajung Hong, Juho Kim, So-Yeon Ahn, Alice Oh
- Abstract summary: We present FABRIC, a pipeline to help students and instructors in English writing classes by automatically generating 1) the overall scores, 2) specific rubric-based scores, and 3) detailed feedback on how to improve the essays.
We evaluate the effectiveness of the new DREsS and the augmentation strategy CASE quantitatively and show significant improvements over the models trained with existing datasets.
- Score: 41.979996110725324
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Automated essay scoring (AES) provides a useful tool for students and
instructors in writing classes by generating essay scores in real-time.
However, previous AES models do not provide more specific rubric-based scores
nor feedback on how to improve the essays, which can be even more important
than the overall scores for learning. We present FABRIC, a pipeline to help
students and instructors in English writing classes by automatically generating
1) the overall scores, 2) specific rubric-based scores, and 3) detailed
feedback on how to improve the essays. Under the guidance of English education
experts, we chose the rubrics for the specific scores as content, organization,
and language. The first component of the FABRIC pipeline is DREsS, a real-world
Dataset for Rubric-based Essay Scoring (DREsS). The second component is CASE, a
Corruption-based Augmentation Strategy for Essays, with which we can improve
the accuracy of the baseline model by 45.44%. The third component is EssayCoT,
the Essay Chain-of-Thought prompting strategy which uses scores predicted from
the AES model to generate better feedback. We evaluate the effectiveness of the
new dataset DREsS and the augmentation strategy CASE quantitatively and show
significant improvements over the models trained with existing datasets. We
evaluate the feedback generated by EssayCoT with English education experts to
show significant improvements in the helpfulness of the feedback across all
rubrics. Lastly, we evaluate the FABRIC pipeline with students in a college
English writing class who rated the generated scores and feedback with an
average of 6 on the Likert scale from 1 to 7.
Related papers
- Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring [0.0]
This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning.<n>We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters.<n>The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels.
arXiv Detail & Related papers (2025-05-13T08:50:57Z) - Enhanced Bloom's Educational Taxonomy for Fostering Information Literacy in the Era of Large Language Models [16.31527042425208]
This paper proposes an LLM-driven Bloom's Educational Taxonomy that aims to recognize and evaluate students' information literacy (IL) with Large Language Models (LLMs)
The framework delineates the IL corresponding to the cognitive abilities required to use LLM into two distinct stages: Exploration & Action and Creation & Metacognition.
arXiv Detail & Related papers (2025-03-25T08:23:49Z) - Position: LLMs Can be Good Tutors in Foreign Language Education [87.88557755407815]
We argue that large language models (LLMs) have the potential to serve as effective tutors in foreign language education (FLE)
Specifically, LLMs can play three critical roles: (1) as data enhancers, improving the creation of learning materials or serving as student simulations; (2) as task predictors, serving as learner assessment or optimizing learning pathway; and (3) as agents, enabling personalized and inclusive education.
arXiv Detail & Related papers (2025-02-08T06:48:49Z) - Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students [53.20318273452059]
Large language models (LLMs) like OpenAI's ChatGPT have opened up new avenues in education.
Despite school restrictions, our survey of over 300 middle and high school students revealed that a remarkable 70% of students have utilized LLMs.
We propose a few ideas to address such issues, including subject-specific models, personalized learning, and AI classrooms.
arXiv Detail & Related papers (2024-11-27T19:19:34Z) - LLM-Driven Learning Analytics Dashboard for Teachers in EFL Writing Education [37.904037443211905]
dashboard facilitates the analysis of student interactions with an essay writing system, which integrates ChatGPT for real-time feedback.
By combining insights from NLP and Human-Computer Interaction (HCI), this study demonstrates how a human-centered approach can enhance the effectiveness of teacher dashboards.
arXiv Detail & Related papers (2024-10-19T07:46:11Z) - Exploring Knowledge Tracing in Tutor-Student Dialogues [53.52699766206808]
We present a first attempt at performing knowledge tracing (KT) in tutor-student dialogues.
We propose methods to identify the knowledge components/skills involved in each dialogue turn.
We then apply a range of KT methods on the resulting labeled data to track student knowledge levels over an entire dialogue.
arXiv Detail & Related papers (2024-09-24T22:31:39Z) - Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course [49.296957552006226]
Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research.
This report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students.
arXiv Detail & Related papers (2024-07-07T00:17:24Z) - I don't trust you (anymore)! -- The effect of students' LLM use on Lecturer-Student-Trust in Higher Education [0.0]
Large Language Models (LLMs) in platforms like Open AI's ChatGPT, has led to their rapid adoption among university students.
This study addresses the research question: How does the use of LLMs by students impact Informational and Procedural Justice, influencing Team Trust and Expected Team Performance?
Our findings indicate that lecturers are less concerned about the fairness of LLM use per se but are more focused on the transparency of student utilization.
arXiv Detail & Related papers (2024-06-21T05:35:57Z) - The Life Cycle of Large Language Models: A Review of Biases in Education [3.8757867335422485]
Large Language Models (LLMs) are increasingly adopted in educational contexts to provide personalized support to students and teachers.
The integration of LLMs in education technology has renewed concerns over algorithmic bias which may exacerbate educational inequities.
This review aims to clarify the complex nature of bias in LLM applications and provide practical guidance for their evaluation to promote educational equity.
arXiv Detail & Related papers (2024-06-03T18:00:28Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - Student Perspectives on Using a Large Language Model (LLM) for an Assignment on Professional Ethics [0.0]
The advent of Large Language Models (LLMs) started a serious discussion among educators on how they would affect curricula, assessments, and students' competencies.
This report presents an assignment within a course on professional competencies, including some related to ethics, that computing master's students need in their careers.
arXiv Detail & Related papers (2024-04-09T09:03:47Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.