A Meta-Analysis of LLM Effects on Students across Qualification, Socialisation, and Subjectification
- URL: http://arxiv.org/abs/2509.22725v2
- Date: Tue, 30 Sep 2025 10:22:27 GMT
- Title: A Meta-Analysis of LLM Effects on Students across Qualification, Socialisation, and Subjectification
- Authors: Jiayu Huang, Ruoxin Ritter Wang, Jen-Hao Liu, Boming Xia, Yue Huang, Ruoxi Sun, Jason Minhui Xue, Jinan Zou,
- Abstract summary: Large language models (LLMs) are increasingly positioned as solutions for education, yet evaluations often reduce their impact to narrow performance metrics.<n>This paper reframes the question by asking "what kind of impact should LLMs have in education?"<n>We present a meta-analysis of 133 experimental and quasi-experimental studies (k = 188)
- Score: 14.218377341186892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are increasingly positioned as solutions for education, yet evaluations often reduce their impact to narrow performance metrics. This paper reframes the question by asking "what kind of impact should LLMs have in education?" Drawing on Biesta's tripartite account of good education: qualification, socialisation, and subjectification, we present a meta-analysis of 133 experimental and quasi-experimental studies (k = 188). Overall, the impact of LLMs on student learning is positive but uneven. Strong effects emerge in qualification, particularly when LLMs function as tutors in sustained interventions. Socialisation outcomes appear more variable, concentrated in sustained, reflective interventions. Subjectification, linked to autonomy and learner development, remains fragile, with improvements confined to small-scale, long-term studies. This purpose-level view highlights design as the decisive factor: without scaffolds for participation and agency, LLMs privilege what is easiest to measure while neglecting broader aims of education. For HCI and education, the issue is not just whether LLMs work, but what futures they enable or foreclose.
Related papers
- ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities [19.704166621790836]
Large language models (LLMs) have demonstrated potential in educational applications, yet their capacity to accurately assess the cognitive alignment of reading materials remains insufficiently explored.<n>We introduce ZPD-SCA, a novel benchmark specifically designed to assess stage-level Chinese reading comprehension difficulty.<n> Experimental results reveal that LLMs perform poorly in zero-shot learning scenarios, with Qwen-max and GLM even falling below the probability of random guessing.
arXiv Detail & Related papers (2025-08-20T03:08:47Z) - MLLMs are Deeply Affected by Modality Bias [158.64371871084478]
Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images.<n>MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs.<n>This paper argues that MLLMs are deeply affected by modality bias, highlighting its manifestations across various tasks.
arXiv Detail & Related papers (2025-05-24T11:49:31Z) - Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes [84.1059652774853]
Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks.<n>Recent studies have exposed critical limitations in their spatial reasoning capabilities.<n>This deficiency in spatial reasoning significantly constrains MLLMs' ability to interact effectively with the physical world.
arXiv Detail & Related papers (2025-04-21T11:48:39Z) - Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond [39.39558417665764]
Large language models (LLMs) should undergo rigorous audits to identify potential risks, such as copyright and privacy infringements.<n>We propose a toolkit of the gradient effect (G-effect), quantifying the impacts of unlearning objectives on model performance.
arXiv Detail & Related papers (2025-02-26T16:59:21Z) - Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models [3.005864877840858]
ChatGPT and other similar tools have prompted tremendous public excitement and experimental effort about the potential of large language models (LLMs) to improve learning experience and outcomes.
However, little research has systematically examined the real-world impacts of LLM availability on educational equity.
We analyze 1,140,328 academic writing submissions from 16,791 college students across 2,391 courses between 2021 and 2024 at a public, minority-serving institution in the US.
We find that students' overall writing quality gradually increased following the availability of LLMs and that the writing quality gaps between linguistically advantaged and disadvantaged students became increasingly narrower
arXiv Detail & Related papers (2024-10-29T17:35:46Z) - The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks.
This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z) - AI Meets the Classroom: When Do Large Language Models Harm Learning? [0.0]
We show that the effect of large language models (LLMs) on learning outcomes depends on usage behavior.<n>While LLMs show great potential to improve learning, their use must be tailored to the educational context.
arXiv Detail & Related papers (2024-08-29T17:07:46Z) - The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach [64.42462708687921]
Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs.
Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods.
This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique.
arXiv Detail & Related papers (2024-03-22T14:47:35Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.