Combining physics education and machine learning research to measure evidence of students' mechanistic sensemaking
- URL: http://arxiv.org/abs/2503.15638v3
- Date: Wed, 17 Sep 2025 18:05:55 GMT
- Title: Combining physics education and machine learning research to measure evidence of students' mechanistic sensemaking
- Authors: Kaitlin Gili, Kyle Heuton, Astha Shah, David Hammer, Michael C. Hughes,
- Abstract summary: We report on progress in the design of an ML-based tool to analyze students' mechanistic sensemaking.<n>We describe pilot tests of the tool, in three versions with different language encoders, to analyze sensemaking evident in college students' written responses to brief conceptual questions.
- Score: 3.82216862698789
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in machine learning (ML) offer new possibilities for science education research. We report on early progress in the design of an ML-based tool to analyze students' mechanistic sensemaking, working from a coding scheme that is aligned with previous work in physics education research (PER) and amenable to recently developed ML classification strategies using language encoders. We describe pilot tests of the tool, in three versions with different language encoders, to analyze sensemaking evident in college students' written responses to brief conceptual questions. The results show, first, that the tool's measurements of sensemaking can achieve useful agreement with a human coder, and, second, that encoder design choices entail a tradeoff between accuracy and computational expense. We discuss the promise and limitations of this approach, providing examples as to how this measurement scheme may serve PER in the future. We conclude with reflections on the use of ML to support PER research, with cautious optimism for strategies of co-design between PER and ML.
Related papers
- Towards Agentic Intelligence for Materials Science [73.4576385477731]
This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining to goal-conditioned agents interfacing with simulation and experimental platforms.<n>To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science.
arXiv Detail & Related papers (2026-01-29T23:48:43Z) - From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z) - PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models [69.73115077227969]
We present PhysUniBench, a large-scale benchmark designed to evaluate and improve the reasoning capabilities of large language models (MLLMs)<n>PhysUniBench consists of 3,304 physics questions spanning 8 major sub-disciplines of physics, each accompanied by one visual diagram.<n>The benchmark's construction involved a rigorous multi-stage process, including multiple roll-outs, expert-level evaluation, automated filtering of easily solved problems, and a nuanced difficulty grading system with five levels.
arXiv Detail & Related papers (2025-06-21T09:55:42Z) - Can Theoretical Physics Research Benefit from Language Agents? [50.57057488167844]
Large Language Models (LLMs) are rapidly advancing across diverse domains, yet their application in theoretical physics research is not yet mature.<n>This position paper argues that LLM agents can potentially help accelerate theoretical, computational, and applied physics when properly integrated with domain knowledge and toolbox.<n>We envision future physics-specialized LLMs that could handle multimodal data, propose testable hypotheses, and design experiments.
arXiv Detail & Related papers (2025-06-06T16:20:06Z) - Dissecting Physics Reasoning in Small Language Models: A Multi-Dimensional Analysis from an Educational Perspective [0.0]
Small Language Models (SLMs) offer computational efficiency and accessibility.<n>This study investigates the high school physics reasoning capabilities of state-of-the-art SLMs.
arXiv Detail & Related papers (2025-05-27T04:33:13Z) - MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving [3.2962799070467432]
This paper explores the capabilities of large language models (LLMs) to assess students' math problem-solving processes and provide adaptive feedback.<n>We evaluate the model's ability to support personalized learning in two scenarios: one where the model has access to students' prior answer histories, and another simulating a cold-start context.
arXiv Detail & Related papers (2025-05-23T15:59:39Z) - Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning [50.53703102032562]
Large-scale Transformer language models (LMs) trained solely on next-token prediction with web-scale data can solve a wide range of tasks.<n>The mechanism behind this capability, known as in-context learning (ICL), remains both controversial and poorly understood.
arXiv Detail & Related papers (2025-05-16T08:50:42Z) - Advancing AI Research Assistants with Expert-Involved Learning [84.30323604785646]
Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear.<n>We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework.<n>We find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning.
arXiv Detail & Related papers (2025-05-03T14:21:48Z) - LLM-based Cognitive Models of Students with Misconceptions [55.29525439159345]
This paper investigates whether Large Language Models (LLMs) can be instruction-tuned to meet this dual requirement.
We introduce MalAlgoPy, a novel Python library that generates datasets reflecting authentic student solution patterns.
Our insights enhance our understanding of AI-based student models and pave the way for effective adaptive learning systems.
arXiv Detail & Related papers (2024-10-16T06:51:09Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - A Personalised Learning Tool for Physics Undergraduate Students Built On a Large Language Model for Symbolic Regression [0.6666419797034796]
Interleaved practice enhances the memory and problem-solving ability of students in undergraduate courses.
We introduce a personalized learning tool built on a Large Language Model (LLM) that can provide immediate and personalized attention to students.
arXiv Detail & Related papers (2024-06-17T13:43:30Z) - Investigating the Impact of SOLID Design Principles on Machine Learning
Code Understanding [2.5788518098820337]
We investigated the impact of the SOLID design principles on Machine Learning code understanding.
We restructured real industrial ML code that did not use SOLID principles.
Results provided statistically significant evidence that the adoption of the SOLID design principles can improve code understanding.
arXiv Detail & Related papers (2024-02-08T00:44:45Z) - SimLM: Can Language Models Infer Parameters of Physical Systems? [56.38608628187024]
We investigate the performance of Large Language Models (LLMs) at performing parameter inference in the context of physical systems.
Our experiments suggest that they are not inherently suited to this task, even for simple systems.
We propose a promising direction of exploration, which involves the use of physical simulators to augment the context of LLMs.
arXiv Detail & Related papers (2023-12-21T12:05:19Z) - A Machine Learning-oriented Survey on Tiny Machine Learning [9.690117347832722]
The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence.
TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies.
arXiv Detail & Related papers (2023-09-21T09:47:12Z) - X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events [75.94926117990435]
This study introduces X-VoE, a benchmark dataset to assess AI agents' grasp of intuitive physics.
X-VoE establishes a higher bar for the explanatory capacities of intuitive physics models.
We present an explanation-based learning system that captures physics dynamics and infers occluded object states.
arXiv Detail & Related papers (2023-08-21T03:28:23Z) - Physics-Based Task Generation through Causal Sequence of Physical
Interactions [3.2244944291325996]
Performing tasks in a physical environment is a crucial yet challenging problem for AI systems operating in the real world.
We present a systematic approach for defining a physical scenario using a causal sequence of physical interactions between objects.
We then propose a methodology for generating tasks in a physics-simulating environment using defined scenarios as inputs.
arXiv Detail & Related papers (2023-08-05T10:15:18Z) - Towards Understanding Machine Learning Testing in Practise [23.535630175567146]
We propose to study visualisations of Machine Learning pipelines by mining Jupyter notebooks.
First, gather general insights and trends using a qualitative study of a smaller sample of notebooks.
And then use the knowledge gained from the qualitative study to design an empirical study using a larger sample of notebooks.
arXiv Detail & Related papers (2023-05-08T18:52:26Z) - Large Language Models are Few-Shot Summarizers: Multi-Intent Comment
Generation via In-Context Learning [34.006227676170504]
This study investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents.
Experiments on two large-scale datasets demonstrate the rationale of our insights.
arXiv Detail & Related papers (2023-04-22T12:26:24Z) - Towards a Holistic Understanding of Mathematical Questions with
Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo.
We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes.
Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z) - Lila: A Unified Benchmark for Mathematical Reasoning [59.97570380432861]
LILA is a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions.
We construct our benchmark by extending 20 datasets benchmark by collecting task instructions and solutions in the form of Python programs.
We introduce BHASKARA, a general-purpose mathematical reasoning model trained on LILA.
arXiv Detail & Related papers (2022-10-31T17:41:26Z) - Symmetry Group Equivariant Architectures for Physics [52.784926970374556]
In the domain of machine learning, an awareness of symmetries has driven impressive performance breakthroughs.
We argue that both the physics community and the broader machine learning community have much to understand.
arXiv Detail & Related papers (2022-03-11T18:27:04Z) - Physics-informed Reinforcement Learning for Perception and Reasoning
about Fluids [0.0]
We propose a physics-informed reinforcement learning strategy for fluid perception and reasoning from observations.
We develop a method for the tracking (perception) and analysis (reasoning) of any previously unseen liquid whose free surface is observed with a commodity camera.
arXiv Detail & Related papers (2022-03-11T07:01:23Z) - Privacy-preserving machine learning with tensor networks [37.01494003138908]
We show that tensor network architectures have especially prospective properties for privacy-preserving machine learning.
First, we describe a new privacy vulnerability that is present in feedforward neural networks, illustrating it in synthetic and real-world datasets.
We rigorously prove that such conditions are satisfied by tensor-network architectures.
arXiv Detail & Related papers (2022-02-24T19:04:35Z) - Panoramic Learning with A Standardized Machine Learning Formalism [116.34627789412102]
This paper presents a standardized equation of the learning objective, that offers a unifying understanding of diverse ML algorithms.
It also provides guidance for mechanic design of new ML solutions, and serves as a promising vehicle towards panoramic learning with all experiences.
arXiv Detail & Related papers (2021-08-17T17:44:38Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Jointly Modeling Heterogeneous Student Behaviors and Interactions Among
Multiple Prediction Tasks [35.15654921278549]
Prediction tasks about students have practical significance for both student and college.
In this paper, we focus on modeling heterogeneous behaviors and making multiple predictions together.
We design three motivating behavior prediction tasks based on a real-world dataset collected from a college.
arXiv Detail & Related papers (2021-03-25T02:01:58Z) - Automatic coding of students' writing via Contrastive Representation
Learning in the Wasserstein space [6.884245063902909]
This work is a step towards building a statistical machine learning (ML) method for supporting qualitative analyses of students' writing.
We show that the ML algorithm approached the inter-rater reliability of human analysis.
arXiv Detail & Related papers (2020-11-26T16:52:48Z) - Explainable Empirical Risk Minimization [0.6299766708197883]
Successful application of machine learning (ML) methods becomes increasingly dependent on their interpretability or explainability.
This paper applies information-theoretic concepts to develop a novel measure for the subjective explainability of predictions delivered by a ML method.
Our main contribution is the explainable empirical risk minimization (EERM) principle of learning a hypothesis that optimally balances between the subjective explainability and risk.
arXiv Detail & Related papers (2020-09-03T07:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.