Related papers: LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics

LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics

URL: http://arxiv.org/abs/2601.18685v1
Date: Mon, 26 Jan 2026 17:00:52 GMT
Title: LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics
Authors: Anselm Strohmaier, Samira Bödefeld, Frank Reinhold,
Abstract summary: We present a Living Meta-Analysis (LIMA) on the effects of generative AI-based interventions for learning mathematics.<n>We continuously update the literature base, apply a Bayesian multilevel meta-regression model to account for cumulative data, and publish updated versions on a preprint server at regular intervals.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The capabilities of generative AI in mathematics education are rapidly evolving, posing significant challenges for research to keep pace. Research syntheses remain scarce and risk being outdated by the time of publication. To address this issue, we present a Living Meta-Analysis (LIMA) on the effects of generative AI-based interventions for learning mathematics. Following PRISMA-LSR guidelines, we continuously update the literature base, apply a Bayesian multilevel meta-regression model to account for cumulative data, and publish updated versions on a preprint server at regular intervals. This paper reports results from the first version, including 15 studies. The analyses indicate a small positive effect (g = 0.31) with a wide credible interval [0.06, 0.58], reflecting the still limited evidence base.

Related papers

LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics [5.676144562388248]
We present a new approach for benchmarking Large Language Model capabilities on research-level mathematics.<n>Existing benchmarks largely rely on static, hand-curated sets of contest or textbook-style problems as proxies for mathematical research.<n>Instead, we establish an updatable benchmark evaluating models directly on the latest research results in mathematics.
arXiv Detail & Related papers (2026-02-27T16:52:52Z)
Evolutionary Strategies lead to Catastrophic Forgetting in LLMs [51.91763220981834]
Evolutionary Strategies (ES) have recently re-emerged as a gradient-free alternative to traditional learning algorithms.<n>ES is able to reach performance numbers close to GRPO for math and reasoning tasks with a comparable compute budget.<n>ES is accompanied by significant forgetting of prior abilities, limiting its applicability for training models online.
arXiv Detail & Related papers (2026-01-28T18:59:34Z)
Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints [15.070885964897734]
Generative large language models (LLMs) introduce a further potential disruption by altering how manuscripts are written.<n>This paper addresses the gap through a large-scale analysis of more than 2.1 million preprints spanning 2016--2025 (115 months) across four major repositories.<n>Our findings reveal that LLMs have accelerated submission and revision cycles, modestly increased linguistic complexity, and disproportionately expanded AI-related topics.
arXiv Detail & Related papers (2025-10-18T01:37:40Z)
Gen AI in Proof-based Math Courses: A Pilot Study [0.0]
This study examines student use and perceptions of generative AI across three proof-based undergraduate mathematics courses.<n>We analyze how students engaged with AI tools, their perceptions of generative AI's usefulness and limitations, and what implications these perceptions hold for teaching proof-based mathematics.
arXiv Detail & Related papers (2025-09-16T22:18:12Z)
Measuring Human Involvement in AI-Generated Text: A Case Study on Academic Writing [39.5254201243129]
Survey revealed that nearly 30% of college students use generative AI to help write academic papers and reports.<n>Most countermeasures treat the detection of AI-generated text as a binary classification task and thus lack robustness.<n>This approach overlooks human involvement in the generation of content even though human-machine collaboration is becoming mainstream.<n>We propose using BERTScore as a metric to measure human involvement in the generation process and a multi-task RoBERTa-based regressor trained on a token classification task to address this problem.
arXiv Detail & Related papers (2025-06-04T02:31:36Z)
LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
We present LeanAgent, a novel lifelong learning framework for formal theorem proving.<n>LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge.<n>It generates formal proofs for 155 theorems across 23 diverse Lean repositories.
arXiv Detail & Related papers (2024-10-08T17:11:24Z)
Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z)
The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning [78.13481522957552]
Machine learning models are sensitive to spurious correlations between non-essential features of the inputs and the corresponding labels.<n>This paper provides a comprehensive survey of this emerging issue, along with a fine-grained taxonomy of existing state-of-the-art methods for addressing spurious correlations in machine learning models.
arXiv Detail & Related papers (2024-02-20T04:49:34Z)
Quantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv Submissions using AI Detection Tool [0.0]
This study will analyze a method that can see purposely manufactured content that academic organizations use to post on Arxiv. The statistical analysis shows that Originality.ai is very accurate, with a rate of 98%.
arXiv Detail & Related papers (2024-02-09T17:20:48Z)
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions [47.83142414018448]
We focus on two popular reasoning tasks: arithmetic reasoning and code generation. We introduce (i) a general ontology of perturbations for math and coding questions, (ii) a semi-automatic method to apply these perturbations, and (iii) two datasets. We show a significant performance drop across all the models against perturbed questions.
arXiv Detail & Related papers (2024-01-17T18:13:07Z)
Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work [0.38850145898707145]
Large language models (LLMs) and generative AI tools present challenges in identifying and addressing biases. generative AI tools are susceptible to goal misgeneralization, hallucinations, and adversarial attacks such as red teaming prompts. We find that incorporating generative AI in the process of writing research manuscripts introduces a new type of context-induced algorithmic bias.
arXiv Detail & Related papers (2023-12-04T04:05:04Z)
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing [74.2952487120137]
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
arXiv Detail & Related papers (2023-01-27T02:30:51Z)
EPARS: Early Prediction of At-risk Students with Online and Offline Learning Behaviors [55.33024245762306]
Early prediction of students at risk (STAR) is an effective and significant means to provide timely intervention for dropout and suicide. Existing works mostly rely on either online or offline learning behaviors which are not comprehensive enough to capture the whole learning processes. We propose a novel algorithm (EPARS) that could early predict STAR in a semester by modeling online and offline learning behaviors.
arXiv Detail & Related papers (2020-06-06T12:56:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.