Related papers: COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation

COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation

URL: http://arxiv.org/abs/2407.11315v1
Date: Tue, 16 Jul 2024 02:02:16 GMT
Title: COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation
Authors: Sannyuya Liu, Jintian Feng, Zongkai Yang, Yawei Luo, Qian Wan, Xiaoxuan Shen, Jianwen Sun,
Abstract summary: This paper proposes COMET, a "Cone of Experience" enhanced large multimodal model for mathematical problem generation. From the perspective of mutual ability promotion and application logic, we unify stem generation and problem solving into mathematical problem generation. The framework divides the fine-tuning data into symbolic experience, iconic experience, and direct experience to draw parallels with experiences in the career growth of teachers.
Score: 12.01484402197104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framework of monotonous data structure with homogeneous training objectives limit the application of large multimodal model in mathematical problem generation. Addressing these challenges, this paper proposes COMET, a "Cone of Experience" enhanced large multimodal model for mathematical problem generation. Firstly, from the perspective of mutual ability promotion and application logic, we unify stem generation and problem solving into mathematical problem generation. Secondly, a three-stage fine-turning framework guided by the "Cone of Experience" is proposed. The framework divides the fine-tuning data into symbolic experience, iconic experience, and direct experience to draw parallels with experiences in the career growth of teachers. Several fine-grained data construction and injection methods are designed in this framework. Finally, we construct a Chinese multimodal mathematical problem dataset to fill the vacancy of Chinese multimodal data in this field. Combined with objective and subjective indicators, experiments on multiple datasets fully verify the effectiveness of the proposed framework and model.

Related papers

ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges [72.19809898215857]
We introduce ModelingBench, a novel benchmark featuring real-world-inspired, open-ended problems from math modeling competitions across diverse domains.<n>These tasks require translating natural language into formal mathematical formulations, applying appropriate tools, and producing structured, defensible reports.<n>We also present ModelingAgent, a multi-agent framework that coordinates tool use, supports structured, creative solutions, and generates well-grounded, creative solutions.
arXiv Detail & Related papers (2025-05-21T03:33:23Z)
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class [27.93059568425132]
HARDMath2 is a dataset of 211 original problems covering the core topics in a graduate applied math class.<n>This dataset was designed and verified by the students and instructors of a core graduate applied mathematics course at Harvard.<n>We build the dataset through a novel collaborative environment that challenges students to write and refine difficult problems consistent with the class syllabus.
arXiv Detail & Related papers (2025-05-17T00:52:49Z)
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection [53.325457460187046]
We introduce MathAgent, a novel Mixture-of-Math-Agent framework designed specifically to address these challenges. MathAgent decomposes error detection into three phases, each handled by a specialized agent. We evaluate MathAgent on real-world educational data, demonstrating approximately 5% higher accuracy in error step identification.
arXiv Detail & Related papers (2025-03-23T16:25:08Z)
Continual Multimodal Contrastive Learning [70.60542106731813]
Multimodal contrastive learning (MCL) advances in aligning different modalities and generating multimodal representations in a joint space. However, a critical yet often overlooked challenge remains: multimodal data is rarely collected in a single process, and training from scratch is computationally expensive. In this paper, we formulate CMCL through two specialized principles of stability and plasticity. We theoretically derive a novel optimization-based method, which projects updated gradients from dual sides onto subspaces where any gradient is prevented from interfering with the previously learned knowledge.
arXiv Detail & Related papers (2025-03-19T07:57:08Z)
Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances from multiple modalities when only a few labeled examples are available. We propose a Generative Transfer Learning framework consisting of two stages: the first involves training on abundant unimodal data, and the second focuses on transfer learning to adapt to novel data. Our finds demonstrate that GTL has superior performance compared to state-of-the-art methods across four distinct multi-modal datasets.
arXiv Detail & Related papers (2024-10-14T16:09:38Z)
HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities. It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z)
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models [41.02149566318779]
We propose a Chinese Multi-modal Math Skill Evaluation Benchmark, named CMMaTH, contraining 23k multimodal K12 math related questions. We have constructed an open-source tool GradeGPT integrated with the CMMaTH dataset, facilitating stable, rapid, and cost-free model evaluation.
arXiv Detail & Related papers (2024-06-28T02:35:51Z)
Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences. Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z)
Multimodal Large Language Models: A Survey [36.06016060015404]
Multimodal language models integrate multiple data types, such as images, text, language, audio, and other heterogeneity. This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms. A practical guide is provided, offering insights into the technical aspects of multimodal models. Lastly, we explore the applications of multimodal models and discuss the challenges associated with their development.
arXiv Detail & Related papers (2023-11-22T05:15:12Z)
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs) We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles. Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z)
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving [77.51817534090789]
We propose textbfJiuZhang2.0, a unified Chinese PLM specially for multi-task mathematical problem solving. Our idea is to maintain a moderate-sized model and employ the emphcross-task knowledge sharing to improve the model capacity in a multi-task setting.
arXiv Detail & Related papers (2023-06-19T15:45:36Z)
Multi-Modal Experience Inspired AI Creation [33.34566822058209]
We study how to generate texts based on sequential multi-modal information. We firstly design a multi-channel sequence-to-sequence architecture equipped with a multi-modal attention network. We then propose a curriculum negative sampling strategy tailored for the sequential inputs.
arXiv Detail & Related papers (2022-09-02T11:50:41Z)
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data. Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.