JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem
Understanding
- URL: http://arxiv.org/abs/2206.06315v1
- Date: Mon, 13 Jun 2022 17:03:52 GMT
- Title: JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem
Understanding
- Authors: Wayne Xin Zhao, Kun Zhou, Zheng Gong, Beichen Zhang, Yuanhang Zhou,
Jing Sha, Zhigang Chen, Shijin Wang, Cong Liu, Ji-Rong Wen
- Abstract summary: This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM)
Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement.
We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
- Score: 74.12405417718054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to advance the mathematical intelligence of machines by
presenting the first Chinese mathematical pre-trained language model~(PLM) for
effectively understanding and representing mathematical problems. Unlike other
standard NLP tasks, mathematical texts are difficult to understand, since they
involve mathematical terminology, symbols and formulas in the problem
statement. Typically, it requires complex mathematical logic and background
knowledge for solving mathematical problems.
Considering the complex nature of mathematical texts, we design a novel
curriculum pre-training approach for improving the learning of mathematical
PLMs, consisting of both basic and advanced courses. Specially, we first
perform token-level pre-training based on a position-biased masking strategy,
and then design logic-based pre-training tasks that aim to recover the shuffled
sentences and formulas, respectively. Finally, we introduce a more difficult
pre-training task that enforces the PLM to detect and correct the errors in its
generated solutions. We conduct extensive experiments on offline evaluation
(including nine math-related tasks) and online $A/B$ test. Experimental results
demonstrate the effectiveness of our approach compared with a number of
competitive baselines. Our code is available at:
\textcolor{blue}{\url{https://github.com/RUCAIBox/JiuZhang}}.
Related papers
- MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code [38.127313175508746]
We introduce a novel method for generating mathematical code accompanied with corresponding reasoning steps for continued pretraining.
Our approach begins with the construction of a high-quality mathematical continued pretraining dataset.
Appending the generated code to each reasoning step results in data consisting of paired natural language reasoning steps and their corresponding code.
arXiv Detail & Related papers (2024-10-10T17:58:40Z) - LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
We present LeanAgent, a novel lifelong learning framework for formal theorem proving.
LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge.
It successfully proves 155 theorems previously unproved formally by humans across 23 diverse Lean repositories.
arXiv Detail & Related papers (2024-10-08T17:11:24Z) - MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula [33.5782208232163]
We propose Math CAMPS: a method to synthesize high-quality mathematical problems at scale.
We encode each standard in a formal grammar, allowing us to sample diverse symbolic problems and their answers.
We derive follow-up questions from symbolic structures and convert them into follow-up word problems.
arXiv Detail & Related papers (2024-07-01T01:56:28Z) - LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier.
Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z) - MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBench is a new benchmark that rigorously assesses the mathematical capabilities of large language models.
MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.
arXiv Detail & Related papers (2024-05-20T17:52:29Z) - Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks [34.09857430966818]
We introduce an extensive mathematics dataset called "MathQuest" sourced from the 11th and 12th standard Mathematics NCERT textbooks.
We conduct fine-tuning experiments with three prominent large language models: LLaMA-2, WizardMath, and MAmmoTH.
Our experiments reveal that among the three models, MAmmoTH-13B emerges as the most proficient, achieving the highest level of competence in solving the presented mathematical problems.
arXiv Detail & Related papers (2024-04-19T08:45:42Z) - FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models [44.63505885248145]
FineMath is a fine-grained mathematical evaluation benchmark dataset for assessing Chinese Large Language Models (LLMs)
FineMath is created to cover the major key mathematical concepts taught in elementary school math, which are divided into 17 categories of math word problems.
All the 17 categories of math word problems are manually annotated with their difficulty levels according to the number of reasoning steps required to solve these problems.
arXiv Detail & Related papers (2024-03-12T15:32:39Z) - Towards a Holistic Understanding of Mathematical Questions with
Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo.
We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes.
Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z) - A Survey of Deep Learning for Mathematical Reasoning [71.88150173381153]
We review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade.
Recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning.
arXiv Detail & Related papers (2022-12-20T18:46:16Z) - Learning to Match Mathematical Statements with Proofs [37.38969121408295]
The task is designed to improve the processing of research-level mathematical texts.
We release a dataset for the task, consisting of over 180k statement-proof pairs.
We show that considering the assignment problem globally and using weighted bipartite matching algorithms helps a lot in tackling the task.
arXiv Detail & Related papers (2021-02-03T15:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.