Related papers: LLMs in the Classroom: Outcomes and Perceptions of Questions Written with the Aid of AI

LLMs in the Classroom: Outcomes and Perceptions of Questions Written with the Aid of AI

URL: http://arxiv.org/abs/2503.18995v1
Date: Sun, 23 Mar 2025 22:01:49 GMT
Title: LLMs in the Classroom: Outcomes and Perceptions of Questions Written with the Aid of AI
Authors: Gavin Witsken, Igor Crk, Eren Gultepe,
Abstract summary: Students were unable to perceive whether questions were written with or without the aid of ChatGPT.<n>Student scores on LLM-authored questions were almost 9% lower.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We randomly deploy questions constructed with and without use of the LLM tool and gauge the ability of the students to correctly answer, as well as their ability to correctly perceive the difference between human-authored and LLM-authored questions. In determining whether the questions written with the aid of ChatGPT were consistent with the instructor's questions and source text, we computed representative vectors of both the human and ChatGPT questions using SBERT and compared cosine similarity to the course textbook. A non-significant Mann-Whitney U test (z = 1.018, p = .309) suggests that students were unable to perceive whether questions were written with or without the aid of ChatGPT. However, student scores on LLM-authored questions were almost 9% lower (z = 2.702, p < .01). This result may indicate that either the AI questions were more difficult or that the students were more familiar with the instructor's style of questions. Overall, the study suggests that while there is potential for using LLM tools to aid in the construction of assessments, care must be taken to ensure that the questions are fair, well-composed, and relevant to the course material.

Related papers

NLP Methods May Actually Be Better Than Professors at Estimating Question Difficulty [15.12489035385276]
We compare various Large Language Model-based methods with three professors in their ability to estimate what percentage of students will give correct answers on True/False exam questions.<n>We obtained even better results using uncertainties of the LLMs solving the questions in a supervised learning setting, using only 42 training samples.
arXiv Detail & Related papers (2025-08-05T10:12:38Z)
Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny [79.56218230251953]
Students in computing education increasingly use large language models (LLMs) such as ChatGPT.<n>This paper investigates how students interact with an LLM when solving formal verification exercises in Dafny.
arXiv Detail & Related papers (2025-06-27T16:34:13Z)
On the effectiveness of LLMs for automatic grading of open-ended questions in Spanish [0.8224695424591679]
This paper explores the performance of different LLMs and prompting techniques in automatically grading short-text answers to open-ended questions. Results are notably sensitive to prompt styles, suggesting biases toward certain words or content in the prompt.
arXiv Detail & Related papers (2025-03-23T13:43:27Z)
Which Questions Improve Learning the Most? Utility Estimation of Questions with LM-based Simulations [37.87879572754863]
We introduce QUEST, a framework that uses language models to simulate learners and quantify the utility of a question.<n>QUEST simulates a learner who asks questions and receives answers while studying a textbook chapter, then uses them to take an end-of-chapter exam.<n>Experiments show that questions generated by QUEST-trained models improve simulated test scores by over 20% compared to strong baselines.
arXiv Detail & Related papers (2025-02-24T18:08:41Z)
Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students [53.20318273452059]
Large language models (LLMs) like OpenAI's ChatGPT have opened up new avenues in education.<n>Despite school restrictions, our survey of over 300 middle and high school students revealed that a remarkable 70% of students have utilized LLMs.<n>We propose a few ideas to address such issues, including subject-specific models, personalized learning, and AI classrooms.
arXiv Detail & Related papers (2024-11-27T19:19:34Z)
Comparison of Large Language Models for Generating Contextually Relevant Questions [6.080820450677854]
GPT-3.5, Llama 2-Chat 13B, and T5 XXL are compared in their ability to create questions from university slide text without fine-tuning. Results indicate that GPT-3.5 and Llama 2-Chat 13B outperform T5 XXL by a small margin, particularly in terms of clarity and question-answer alignment.
arXiv Detail & Related papers (2024-07-30T06:23:59Z)
How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading [60.19226384241482]
We introduce GuidingQ, a dataset of 10K in-text questions from textbooks and scientific articles. We explore various approaches to generate such questions using language models. We conduct a human study to understand the implication of such questions on reading comprehension.
arXiv Detail & Related papers (2024-07-19T13:42:56Z)
Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z)
Which questions should I answer? Salience Prediction of Inquisitive Questions [118.097974193544]
We show that highly salient questions are empirically more likely to be answered in the same article. We further validate our findings by showing that answering salient questions is an indicator of summarization quality in news.
arXiv Detail & Related papers (2024-04-16T21:33:05Z)
Can AI Assistants Know What They Don't Know? [79.6178700946602]
An AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. We construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions. After alignment with Idk datasets, the assistant can refuse to answer most its unknown questions.
arXiv Detail & Related papers (2024-01-24T07:34:55Z)
Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning [4.376598435975689]
We discuss the challenges associated with employing large language models to enhance students' mathematical problem-solving skills. LLMs can generate the wrong reasoning processes, and also exhibit difficulty in understanding the given questions' rationales when attempting to correct students' answers.
arXiv Detail & Related papers (2023-10-20T16:05:35Z)
Automatic Generation of Socratic Subquestions for Teaching Math Word Problems [16.97827669744673]
We explore the ability of large language models (LMs) in generating sequential questions for guiding math word problem-solving. On both automatic and human quality evaluations, we find that LMs constrained with desirable question properties generate superior questions. Results suggest that the difficulty level of problems plays an important role in determining whether questioning improves or hinders human performance.
arXiv Detail & Related papers (2022-11-23T10:40:22Z)
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.