Related papers: MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education

Related papers

ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints [42.713620384054146]
This paper introduces ORIGAMISPACE, a new dataset and benchmark designed to evaluate the multi-step spatial reasoning ability.<n>We propose four evaluation tasks: Pattern Prediction, Multi-step Spatial Reasoning, Spatial Relationship Prediction, and End-to-End CP Code Generation.
arXiv Detail & Related papers (2025-11-23T13:42:22Z)
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images [69.93976232543066]
We propose CodePlot-CoT, a code-driven Chain-of-Thought paradigm for "thinking with images" in mathematics.<n>To achieve this, we first construct Math-VR, the first large-scale, bilingual dataset and benchmark for Mathematics problems with Visual Reasoning.<n>Our model achieves up to 21% increase over base model on our new benchmark, fully validating the efficacy of our proposed code-driven reasoning paradigm.
arXiv Detail & Related papers (2025-10-13T17:59:55Z)
OnlineMate: An LLM-Based Multi-Agent Companion System for Cognitive Support in Online Learning [20.08144763551689]
We propose OnlineMate, a multi-agent learning companion system driven by large language models (LLMs)<n>OnlineMate is capable of simulating peer-like agent roles, adapting to learners' cognitive states during collaborative discussions, and inferring their psychological states, such as misunderstandings, confusion, or motivation.<n> Experimental results in simulated learning scenarios demonstrate that OnlineMate effectively fosters deep learning and discussions while enhancing cognitive engagement in online educational settings.
arXiv Detail & Related papers (2025-09-18T09:56:45Z)
Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z)
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science [62.96434290874878]
Current Multi-Modal Large Language Models (MLLM) have shown strong capabilities in general visual reasoning tasks. We develop a new framework, named Multi-Modal Scientific Reasoning with Physics Perception and Simulation (MAPS) based on an MLLM. MAPS decomposes expert-level multi-modal reasoning task into physical diagram understanding via a Physical Perception Model (PPM) and reasoning with physical knowledge via a simulator.
arXiv Detail & Related papers (2025-01-18T13:54:00Z)
Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks? [6.7065734065794835]
We introduce a novel learning paradigm termed MLLM4WTAL. It harnesses the potential of MLLM to offer temporal action key semantics and complete semantic priors. It achieves this by integrating two distinct modules: Key Semantic Matching (KSM) and Complete Semantic Reconstruction (CSR)
arXiv Detail & Related papers (2024-11-13T09:37:24Z)
Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences [11.576679362717478]
This study focuses on language learning as a context for modeling virtual student agents. By curating a dataset of personalized teacher-student interactions with various personality traits, we conduct multi-dimensional evaluation experiments.
arXiv Detail & Related papers (2024-10-21T07:18:24Z)
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments [1.4999444543328293]
Simulating learner actions helps stress-test open-ended interactive learning environments and prototype new adaptations before deployment. We propose Hyp-Mix, a simulation authoring framework that allows experts to develop and evaluate simulations by combining testable hypotheses about learner behavior.
arXiv Detail & Related papers (2024-10-03T00:25:40Z)
Simulating Classroom Education with LLM-Empowered Agents [52.62324491261461]
SimClass is a multi-agent classroom simulation framework involving user participation. We recognize representative class roles and introduce a novel class control mechanism for automatic classroom teaching. We demonstrate that LLMs can simulate traditional classroom interaction patterns effectively while enhancing user's experience.
arXiv Detail & Related papers (2024-06-27T14:51:07Z)
What is the Visual Cognition Gap between Humans and Multimodal LLMs? [63.81347276258992]
We evaluate the visual cognition capability of Multimodal Large Language Models (MLLMs) and compare their performance with human visual cognition studies.<n>Our comparative experiments with different baselines reveal a gap between MLLMs and human intelligence.<n>We believe that the public release of MaRs-VQA and the Qwen2-VCog baseline model will drive progress toward the next generation of MLLMs with human-like visual cognition abilities.
arXiv Detail & Related papers (2024-06-14T22:02:21Z)
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions [58.57255822646756]
This paper introduces MathChat, a benchmark designed to evaluate large language models (LLMs) across a broader spectrum of mathematical tasks. We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios. We develop MathChat sync, a synthetic dialogue based math dataset for LLM finetuning, focusing on improving models' interaction and instruction following capabilities in conversations.
arXiv Detail & Related papers (2024-05-29T18:45:55Z)
ST-LLM: Large Language Models Are Effective Temporal Learners [58.79456373423189]
Large Language Models (LLMs) have showcased impressive capabilities in text comprehension and generation. How to effectively encode and understand videos in video-based dialogue systems remains to be solved. We propose ST-LLM, an effective video-LLM baseline with spatial-temporal sequence modeling inside LLM.
arXiv Detail & Related papers (2024-03-30T10:11:26Z)
Democratizing Reasoning Ability: Tailored Learning from Large Language Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs. We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z)
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts. This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals. We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z)
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics [32.123919380959485]
Multi-modal large language models (MLLMs) are trained based on large language models (LLM) While they excel in multi-modal tasks, the pure NLP abilities of MLLMs are often underestimated and left untested. We show that visual instruction tuning, a prevailing strategy for transitioning LLMs into MLLMs, unexpectedly and interestingly helps models attain both improved truthfulness and ethical alignment.
arXiv Detail & Related papers (2023-09-13T17:57:21Z)
Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs) We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics. We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z)
Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home [8.819665252533104]
This work explores Spoken Language Understanding (SLU) pipeline within a task-oriented dialogue system developed for Kid Space. Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) components evaluated on our home deployment data.
arXiv Detail & Related papers (2023-06-01T09:31:57Z)
ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind [3.9599054392856483]
We present ToMChallenges, a dataset for comprehensively evaluating the Theory of Mind based on the Sally-Anne and Smarties tests with a diverse set of tasks. Our evaluation results and error analyses show that LLMs have inconsistent behaviors across prompts and tasks.
arXiv Detail & Related papers (2023-05-24T11:54:07Z)
End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics [8.819665252533104]
This work presents a task-oriented Spoken Dialogue System (SDS) built to support play-based learning of basic math concepts for early childhood education. The system has been evaluated via real-world deployments at school while the students are practicing early math concepts with multimodal interactions.
arXiv Detail & Related papers (2022-11-07T12:58:24Z)
MAML is a Noisy Contrastive Learner [72.04430033118426]
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays. We provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function. We propose a simple but effective technique, zeroing trick, to alleviate such interference.
arXiv Detail & Related papers (2021-06-29T12:52:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.