Related papers: Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students

Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students

URL: http://arxiv.org/abs/2504.10961v2
Date: Tue, 12 Aug 2025 10:35:05 GMT
Title: Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students
Authors: Audrey Zhang, Yifei Gao, Wannapon Suraworachet, Tanya Nazaretsky, Mutlu Cukurova,
Abstract summary: This study compares undergraduate students' trust in large language models (LLMs), human, and human-AI co-produced feedback in their authentic HE context.<n>Findings revealed students preferred AI and co-produced feedback over human feedback regarding perceived usefulness and objectivity.<n>Educational AI experience improved students' ability to identify LLM-generated feedback and increased their trust in all types of feedback.
Score: 2.935250567679577
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: As generative AI models, particularly large language models (LLMs), transform educational feedback practices in higher education (HE) contexts, understanding students' perceptions of different sources of feedback becomes crucial for their effective implementation and adoption. This study addresses a critical gap by comparing undergraduate students' trust in LLM, human, and human-AI co-produced feedback in their authentic HE context. More specifically, through a within-subject experimental design involving 91 participants, we investigated factors that predict students' ability to distinguish between feedback types, their perceptions of feedback quality, and potential biases related to the source of feedback. Findings revealed that when the source was blinded, students generally preferred AI and co-produced feedback over human feedback regarding perceived usefulness and objectivity. However, they presented a strong bias against AI when the source of feedback was disclosed. In addition, only AI feedback suffered a decline in perceived genuineness when feedback sources were revealed, while co-produced feedback maintained its positive perception. Educational AI experience improved students' ability to identify LLM-generated feedback and increased their trust in all types of feedback. More years of students' experience using AI for general purposes were associated with lower perceived usefulness and credibility of feedback. These insights offer substantial evidence of the importance of source credibility and the need to enhance both feedback literacy and AI literacy to mitigate bias in student perceptions for AI-generated feedback to be adopted and impact education.

Related papers

LLM-based Multimodal Feedback Produces Equivalent Learning and Better Student Perceptions than Educator Feedback [4.225232488376583]
This study introduces a real-time AI-facilitated multimodal feedback system that integrates structured textual explanations with dynamic multimedia resources.<n>In an online crowdsourcing experiment, we compared this system against fixed business-as-usual feedback by educators across three dimensions.<n>Results showed that AI multimodal feedback achieved learning gains equivalent to original educator feedback while significantly outperforming it on perceived clarity, specificity, conciseness, motivation, satisfaction, and reducing cognitive load.
arXiv Detail & Related papers (2026-01-21T18:58:08Z)
Directive, Metacognitive or a Blend of Both? A Comparison of AI-Generated Feedback Types on Student Engagement, Confidence, and Outcomes [1.8839714322633465]
This study presents a semester-long randomised controlled trial with 329 students in an introductory design and programming course using an adaptive educational platform.<n>Participants were assigned to receive directive, metacognitive, or hybrid AI-generated feedback that blended elements of both directive and metacognitive feedback.<n>Results showed that revision behaviour differed across feedback conditions, with Hybrid prompting the most revisions compared to Directive and Metacognitive.
arXiv Detail & Related papers (2025-10-22T15:31:21Z)
AI-Educational Development Loop (AI-EDL): A Conceptual Framework to Bridge AI Capabilities with Classical Educational Theories [8.500617875591633]
This study introduces the AI-Educational Development Loop (AI-EDL), a theory-driven framework that integrates classical learning theories with human-in-the-loop artificial intelligence (AI)<n>The framework emphasizes transparency, self-regulated learning, and pedagogical oversight.
arXiv Detail & Related papers (2025-08-01T15:44:19Z)
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration [79.69935257008467]
We introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities.<n>We conduct the first large-scale human study (N=118) explicitly designed to measure it.<n>In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding.
arXiv Detail & Related papers (2025-06-05T20:48:16Z)
Exploring LLM-Generated Feedback for Economics Essays: How Teaching Assistants Evaluate and Envision Its Use [3.345149032274467]
This project examines the prospect of using AI-generated feedback as suggestions to expedite and enhance human instructors' feedback provision.<n>We developed a feedback engine that generates feedback on students' essays based on grading rubrics used by the teaching assistants (TAs)<n>We performed think-aloud studies with 5 TAs over 20 1-hour sessions to have them evaluate the AI feedback, contrast the AI feedback with their handwritten feedback, and share how they envision using the AI feedback if they were offered as suggestions.
arXiv Detail & Related papers (2025-05-21T14:50:30Z)
Understanding and Supporting Peer Review Using AI-reframed Positive Summary [18.686807993563168]
This study explored the impact of appending an automatically generated positive summary to the peer reviews of a writing task.<n>We found that adding an AI-reframed positive summary to otherwise harsh feedback increased authors' critique acceptance.<n>We discuss the implications of using AI in peer feedback, focusing on how it can influence critique acceptance and support research communities.
arXiv Detail & Related papers (2025-03-13T11:22:12Z)
Personalised Feedback Framework for Online Education Programmes Using Generative AI [0.0]
This paper presents an alternative feedback framework which extends the capabilities of ChatGPT by integrating embeddings. As part of the study, we proposed and developed a proof of concept solution, achieving an efficacy rate of 90% and 100% for open-ended and multiple-choice questions.
arXiv Detail & Related papers (2024-10-14T22:35:40Z)
Integrating AI for Enhanced Feedback in Translation Revision- A Mixed-Methods Investigation of Student Engagement [0.0]
The application of Artificial Intelligence (AI)-generated feedback, particularly from language models like ChatGPT, remains understudied in translation education. This study investigates the engagement of master's students in translation with ChatGPT-generated feedback during their revision process.
arXiv Detail & Related papers (2024-10-11T07:21:29Z)
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle [61.105703857868775]
We propose a self-reference-based AI feedback framework that enables a 13B Llama2-Chat to provide high-quality feedback. Specifically, we allow the AI to first respond to the user's instructions, then generate criticism of other answers based on its own response as a reference. Finally, we determine which answer better fits human preferences according to the criticism.
arXiv Detail & Related papers (2024-06-17T03:51:46Z)
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking. We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert. We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z)
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback. The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied. We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z)
Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes. We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function. Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
The Responsible Development of Automated Student Feedback with Generative AI [6.008616775722921]
Recent advancements in AI, particularly with large language models (LLMs), present new opportunities to deliver scalable, repeatable, and instant feedback.<n>However, implementing these technologies also introduces a host of ethical considerations that must thoughtfully be addressed.<n>One of the core advantages of AI systems is their ability to automate routine and mundane tasks, potentially freeing up human educators for more nuanced work.<n>However, the ease of automation risks a tyranny of the majority'', where the diverse needs of minority or unique learners are overlooked.
arXiv Detail & Related papers (2023-08-29T14:29:57Z)
Effects of Human vs. Automatic Feedback on Students' Understanding of AI Concepts and Programming Style [0.0]
The use of automatic grading tools has become nearly ubiquitous in large undergraduate programming courses. There is a relative lack of data directly comparing student outcomes when receiving computer-generated feedback and human-written feedback. This paper addresses this gap by splitting one 90-student class into two feedback groups and analyzing differences in the two cohorts' performance.
arXiv Detail & Related papers (2020-11-20T21:40:32Z)
Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework [51.237191651923666]
We investigate the potential of agent learning from trainers' facial expressions via interpreting them as evaluative feedback. With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback. Our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible.
arXiv Detail & Related papers (2020-01-23T17:50:57Z)
Artificial Artificial Intelligence: Measuring Influence of AI 'Assessments' on Moral Decision-Making [48.66982301902923]
We examined the effect of feedback from false AI on moral decision-making about donor kidney allocation. We found some evidence that judgments about whether a patient should receive a kidney can be influenced by feedback about participants' own decision-making perceived to be given by AI.
arXiv Detail & Related papers (2020-01-13T14:15:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.