Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
- URL: http://arxiv.org/abs/2512.16914v1
- Date: Thu, 18 Dec 2025 18:59:46 GMT
- Title: Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
- Authors: Nikhil Prakash, Donghao Ren, Dominik Moritz, Yannick Assogba,
- Abstract summary: Constructive Circuit Amplification identifies tokens from model reasoning traces as well as model components responsible for the desired task.<n>It improves accuracy by up to +11.4% across multiple models while modifying as little as 1.59% of model components.
- Score: 17.40366590937297
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing specific tasks. Additionally, it has been shown that model performance improvement through fine-tuning often results from the strengthening of existing circuits in the model. Taken together, these findings suggest the possibility of intervening directly on such circuits to make precise, task-targeted updates. Motivated by these findings, we propose a novel method called Constructive Circuit Amplification which identifies pivotal tokens from model reasoning traces as well as model components responsible for the desired task, and updates only those components. Applied to mathematical reasoning, it improves accuracy by up to +11.4% across multiple models while modifying as little as 1.59% of model components, with minimal impact on other abilities as measured by MMLU, TriviaQA, and TruthfulQA. These results demonstrate that targeted capabilities can be reliably enhanced by selectively updating a sparse set of model components.
Related papers
- Energy-Aware Data-Driven Model Selection in LLM-Orchestrated AI Systems [1.794523136381106]
Large Language Models (LLMs) provide descriptions of models for decision-making.<n> descriptions do not reflect true model capabilities and performance characteristics, leading to suboptimal model selection, reduced accuracy, and increased energy costs.<n>We propose GUIDE, a new energy-aware model selection framework that accounts for performance-energy trade-offs by incorporating quantitative model performance characteristics in decision-making.
arXiv Detail & Related papers (2025-11-30T21:46:54Z) - Did Models Sufficient Learn? Attribution-Guided Training via Subset-Selected Counterfactual Augmentation [61.248535801314375]
Subset-Selected Counterfactual Augmentation (SS-CA)<n>We develop Counterfactual LIMA to identify minimal spatial region sets whose removal can selectively alter model predictions.<n>Experiments show that SS-CA improves generalization on in-distribution (ID) test data and achieves superior performance on out-of-distribution (OOD) benchmarks.
arXiv Detail & Related papers (2025-11-15T08:39:22Z) - Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch [63.40752011615843]
Training tool-augmented language models has emerged as a promising approach to enhancing their capabilities for complex tasks.<n>We propose a dynamic generalization-guided reward design for rule-based reinforcement learning.<n>We show that our models achieve over 7% performance improvement compared to both SFT and RL-with-SFT models.
arXiv Detail & Related papers (2025-11-02T16:33:45Z) - Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training [121.5858973157225]
We investigate the effects of prolonged reinforcement learning on a small language model across a diverse set of reasoning domains.<n>We introduce controlled KL regularization, clipping ratio, and periodic reference policy resets as critical components for unlocking long-term performance gains.<n>Our model achieves significant improvements over strong baselines, including +14.7% on math, +13.9% on coding, and +54.8% on logic puzzle tasks.
arXiv Detail & Related papers (2025-07-16T17:59:24Z) - Activation Reward Models for Few-Shot Model Alignment [77.37511364793515]
We introduce Activation Reward Models (Activation RMs)<n>Activation RMs leverage activation steering to construct well-aligned reward signals using minimal supervision and no additional model finetuning.<n>We demonstrate the effectiveness of Activation RMs in mitigating reward hacking behaviors, highlighting their utility for safety-critical applications.
arXiv Detail & Related papers (2025-07-02T05:10:29Z) - Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs [15.23174472320989]
Large Language Models (LLMs) are central to many contemporary AI applications.<n>Recent works in eXplainable AI (XAI) suggest that interpretability can also enable model compression.
arXiv Detail & Related papers (2025-06-16T17:38:36Z) - Exploring Model Editing for LLM-based Aspect-Based Sentiment Classification [17.512415475301395]
We investigate model editing to serve an efficient method for adapting large language models (LLMs) to solve aspect-based sentiment classification.<n>Our findings reveal that a distinct set of mid-layer representations is essential for detecting the sentiment polarity of given aspect words.<n>We develop a model editing approach that focuses exclusively on these critical parts of the LLM, leading to a more efficient method for adapting LLMs.
arXiv Detail & Related papers (2025-03-19T11:21:37Z) - Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis [37.37040454356059]
This paper aims to provide an in-depth interpretation of the fine-tuning process through circuit analysis.<n>We identify circuits at various checkpoints during fine-tuning and examine the interplay between circuit analysis, fine-tuning methods, and task complexities.
arXiv Detail & Related papers (2025-02-17T13:59:41Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - A Baseline Analysis of Reward Models' Ability To Accurately Analyze
Foundation Models Under Distribution Shift [2.2310395620011945]
We evaluate how reward model performance is affected by distribution shift.
We show novel calibration patterns and accuracy drops due to OOD prompts and responses.
We adapt an OOD detection technique commonly used in classification to the reward model setting to detect these distribution shifts.
arXiv Detail & Related papers (2023-11-21T18:41:26Z) - DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator
Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead.
Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure.
Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.