Do Tutors Learn from Equity Training and Can Generative AI Assess It?
- URL: http://arxiv.org/abs/2412.11255v1
- Date: Sun, 15 Dec 2024 17:36:40 GMT
- Title: Do Tutors Learn from Equity Training and Can Generative AI Assess It?
- Authors: Danielle R. Thomas, Conrad Borchers, Sanjit Kakarla, Jionghao Lin, Shambhavi Bhushan, Boyuan Guo, Erin Gatz, Kenneth R. Koedinger,
- Abstract summary: We evaluate tutor performance within an online lesson on enhancing tutors' skills when responding to students in potentially inequitable situations.
We find marginally significant learning gains with increases in tutors' self-reported confidence in their knowledge.
This work makes available a dataset of lesson log data, tutor responses, rubrics for human annotation, and generative AI prompts.
- Score: 2.116573423199236
- License:
- Abstract: Equity is a core concern of learning analytics. However, applications that teach and assess equity skills, particularly at scale are lacking, often due to barriers in evaluating language. Advances in generative AI via large language models (LLMs) are being used in a wide range of applications, with this present work assessing its use in the equity domain. We evaluate tutor performance within an online lesson on enhancing tutors' skills when responding to students in potentially inequitable situations. We apply a mixed-method approach to analyze the performance of 81 undergraduate remote tutors. We find marginally significant learning gains with increases in tutors' self-reported confidence in their knowledge in responding to middle school students experiencing possible inequities from pretest to posttest. Both GPT-4o and GPT-4-turbo demonstrate proficiency in assessing tutors ability to predict and explain the best approach. Balancing performance, efficiency, and cost, we determine that few-shot learning using GPT-4o is the preferred model. This work makes available a dataset of lesson log data, tutor responses, rubrics for human annotation, and generative AI prompts. Future work involves leveling the difficulty among scenarios and enhancing LLM prompts for large-scale grading and assessment.
Related papers
- Unifying AI Tutor Evaluation: An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors [7.834688858839734]
We investigate whether current state-of-the-art large language models (LLMs) are effective as AI tutors.
We propose a unified evaluation taxonomy with eight pedagogical dimensions based on key learning sciences principles.
We release MRBench - a new evaluation benchmark containing 192 conversations and 1,596 responses from seven state-of-the-art LLM-based and human tutors.
arXiv Detail & Related papers (2024-12-12T16:24:35Z) - How Good is ChatGPT in Giving Adaptive Guidance Using Knowledge Graphs in E-Learning Environments? [0.8999666725996978]
This study introduces an approach that integrates dynamic knowledge graphs with large language models (LLMs) to offer nuanced student assistance.
Central to this method is the knowledge graph's role in assessing a student's comprehension of topic prerequisites.
Preliminary findings suggest students could benefit from this tiered support, achieving enhanced comprehension and improved task outcomes.
arXiv Detail & Related papers (2024-12-05T04:05:43Z) - KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.
We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.
Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants [176.39275404745098]
We evaluate whether two AI assistants, GPT-3.5 and GPT-4, can adequately answer assessment questions.
GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions.
Our results call for revising program-level assessment design in higher education in light of advances in generative AI.
arXiv Detail & Related papers (2024-08-07T12:11:49Z) - Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course [49.296957552006226]
Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research.
This report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students.
arXiv Detail & Related papers (2024-07-07T00:17:24Z) - CourseAssist: Pedagogically Appropriate AI Tutor for Computer Science Education [1.052788652996288]
This poster introduces CourseAssist, a novel LLM-based tutoring system tailored for computer science education.
Unlike generic LLM systems, CourseAssist uses retrieval-augmented generation, user intent classification, and question decomposition to align AI responses with specific course materials and learning objectives.
arXiv Detail & Related papers (2024-05-01T20:43:06Z) - Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes.
We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function.
Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z) - Improving Assessment of Tutoring Practices using Retrieval-Augmented
Generation [10.419430731115405]
One-on-one tutoring is an effective instructional method for enhancing learning, yet its efficacy hinges on tutor competencies.
This study aims to harness Generative Pre-trained Transformers (GPT), such as GPT-3.5 and GPT-4 models, to automatically assess tutors' ability of using social-emotional tutoring strategies.
arXiv Detail & Related papers (2024-02-04T20:42:30Z) - Using Large Language Models to Assess Tutors' Performance in Reacting to
Students Making Math Errors [2.099922236065961]
We investigate the capacity of generative AI to evaluate real-life tutors' performance in responding to students making math errors.
By analyzing 50 real-life tutoring dialogues, we find both GPT-3.5-Turbo and GPT-4 demonstrate proficiency in assessing the criteria related to reacting to students making errors.
GPT-4 tends to overidentify instances of students making errors, often attributing student uncertainty or inferring potential errors where human evaluators did not.
arXiv Detail & Related papers (2024-01-06T15:34:27Z) - Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs)
We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics.
We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z) - ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.