Related papers: Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

URL: http://arxiv.org/abs/2410.16930v2
Date: Tue, 18 Feb 2025 19:45:14 GMT
Title: Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes
Authors: Bryan R. Christ, Zack Gottesman, Jonathan Kropko, Thomas Hartvigsen,
Abstract summary: Math reasoning is a hallmark of artificial intelligence and has implications in several domains, including math education. Few works have explored how math reasoning is encoded within Large Language Model parameters and if it is a skill that can be isolated within models. We introduce MathNeuro, a computationally efficient method to isolate math-specific parameters in LLMs using only forward passes. MathNeuro builds on existing work by using weights and activations to calculate parameter importance, but isolates math-specific parameters by filtering out those important for general language tasks.
Score: 10.314228434999924
License:
Abstract: Math reasoning is an active area of Large Language Model (LLM) research because it is a hallmark of artificial intelligence and has implications in several domains, including math education. However, few works have explored how math reasoning is encoded within LLM parameters and if it is a skill that can be isolated within models. Doing so could allow targeted intervention to improve math performance without altering non-math behavior and foster understanding of how models encode math reasoning. We introduce Math Neurosurgery (MathNeuro), a computationally efficient method we use to isolate math-specific parameters in LLMs using only forward passes. MathNeuro builds on existing work by using weights and activations to calculate parameter importance, but isolates math-specific parameters by filtering out those important for general language tasks. Through pruning parameters MathNeuro identifies, we delete a LLM's math reasoning ability without significantly impacting its general language ability. Scaling the identified parameters by a small constant improves a pretrained or instruction-tuned LLM's performance by 4-17% on GSM8K and 5-35% on MATH while leaving non-math behavior unaltered. MathNeuro is also data efficient: most of its effectiveness holds when identifying math-specific parameters using a single sample. MathNeuro highlights the potential for future work to intervene on math-specific parameters.

Related papers

Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist [46.670206614087334]
We argue that if a model really understands a problem, it should be robustly applied across a diverse array of tasks. MathCheck is a well-designed checklist for testing task generalization and reasoning. MathCheck better reflects true mathematical abilities and represents mathematical intelligence more linearly.
arXiv Detail & Related papers (2024-07-11T17:58:58Z)
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBench is a new benchmark that rigorously assesses the mathematical capabilities of large language models. MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.
arXiv Detail & Related papers (2024-05-20T17:52:29Z)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z)
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly. This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z)
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning. We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z)
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
We open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning.
arXiv Detail & Related papers (2024-02-09T11:22:08Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [130.37945867605302]
We present WizardMath, which enhances the mathematical CoT reasoning abilities of large language models (LLMs) without using external python tools. Remarkably, WizardMath-Mistral 7B surpasses top-tier open-source LLMs by a substantial margin with higher data efficiency. Our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance.
arXiv Detail & Related papers (2023-08-18T14:23:21Z)
Learning Multi-Step Reasoning by Solving Arithmetic Tasks [6.398022050054328]
This work investigates how to incorporate relatively small Language Models with the capabilities of multi-step reasoning. We propose to inject such abilities by continually pre-training LMs on a synthetic dataset MsAT. Our experiments on four math word problem datasets show the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-06-02T17:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.