Related papers: Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

URL: http://arxiv.org/abs/2410.16930v1
Date: Tue, 22 Oct 2024 12:00:58 GMT
Title: Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes
Authors: Bryan R. Christ, Zack Gottesman, Jonathan Kropko, Thomas Hartvigsen,
Abstract summary: We introduce Math Neurosurgery (MathNeuro), a method for isolating math-specific parameters in Large Language Model (LLM) models. MathNeuro identifies deletes a LLM's math reasoning ability without destroying its general language ability. MathNeuro highlights the potential for future work to intervene on math-specific parameters.
Score: 10.314228434999924
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Math reasoning is a highly active area of Large Language Model (LLM) research because it is a hallmark of artificial intelligence. However, few works have explored how math reasoning is encoded within LLM parameters and if it is a skill that can be isolated within a model. Doing so could allow targeted intervention to improve math performance without altering non-math behavior and foster understanding of how models encode math reasoning. We introduce Math Neurosurgery (MathNeuro), a method for isolating math-specific parameters in LLMs using only forward passes. MathNeuro builds on existing work by using weights and activations to calculate parameter importance, but isolates math-specific parameters by removing those important for general language tasks. Pruning parameters MathNeuro identifies deletes a LLM's math reasoning ability without destroying its general language ability. Scaling these parameters by a small constant improves a pretrained or instruction-tuned LLM's performance by 4-17% on GSM8K while leaving non-math behavior unaltered. MathNeuro is also data efficient: most of its effectiveness holds when identifying math-specific parameters using a single sample. MathNeuro highlights the potential for future work to intervene on math-specific parameters.

Related papers

Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics [15.695635219034328]
This work focuses on the extent to which LLMs ground their programs to math rules, and how that affects their end performance. Our results reveal that the distribution of grounding depends on LLMs' capabilities and the difficulty of math problems. On MATH500, the percentage of grounded programs decreased to half, while the ungrounded generations doubled in comparison to ASDiv grade-school problems.
arXiv Detail & Related papers (2025-04-24T15:34:24Z)
AI-Assisted Generation of Difficult Math Questions [78.7547836422727]
Current training positions mathematical reasoning as a core capability. There is unmet demand for diverse and challenging math questions. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach.
arXiv Detail & Related papers (2024-07-30T17:55:36Z)
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist [46.670206614087334]
We argue that if a model really understands a problem, it should be robustly applied across a diverse array of tasks. MathCheck is a well-designed checklist for testing task generalization and reasoning. MathCheck better reflects true mathematical abilities and represents mathematical intelligence more linearly.
arXiv Detail & Related papers (2024-07-11T17:58:58Z)
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBench is a new benchmark that rigorously assesses the mathematical capabilities of large language models. MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.
arXiv Detail & Related papers (2024-05-20T17:52:29Z)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z)
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly. This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z)
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning. We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z)
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
We open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning.
arXiv Detail & Related papers (2024-02-09T11:22:08Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [130.37945867605302]
We present WizardMath, which enhances the mathematical CoT reasoning abilities of large language models (LLMs) without using external python tools. Remarkably, WizardMath-Mistral 7B surpasses top-tier open-source LLMs by a substantial margin with higher data efficiency. Our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance.
arXiv Detail & Related papers (2023-08-18T14:23:21Z)
Learning Multi-Step Reasoning by Solving Arithmetic Tasks [6.398022050054328]
This work investigates how to incorporate relatively small Language Models with the capabilities of multi-step reasoning. We propose to inject such abilities by continually pre-training LMs on a synthetic dataset MsAT. Our experiments on four math word problem datasets show the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-06-02T17:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.