Related papers: †DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems

†DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems

URL: http://arxiv.org/abs/2601.06853v1
Date: Sun, 11 Jan 2026 10:51:03 GMT
Title: †DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems
Authors: Zabir Al Nazi, Shubhashis Roy Dipta, Sudipta Kar,
Abstract summary: Chain-of-Thought (CoT) prompting is widely adopted for mathematical problem solving, but its behavior under irrelevant context remains underexplored.<n>DisTRACTMATH-BN is a benchmark that augments MGSM and MSVAMP with semantically coherent but computationally irrelevant information.<n>DAGGER reformulates mathematical problem solving as executable computational graph generation with explicit modeling of distractor nodes.
Score: 1.2310602580215997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain-of-Thought (CoT) prompting is widely adopted for mathematical problem solving, including in low-resource languages, yet its behavior under irrelevant context remains underexplored. To systematically study this challenge, we introduce DISTRACTMATH-BN, a Bangla benchmark that augments MGSM and MSVAMP with semantically coherent but computationally irrelevant information. Evaluating seven models ranging from 3B to 12B parameters, we observe substantial performance degradation under distractors: standard models drop by up to 41 points, while reasoning-specialized models decline by 14 to 20 points despite consuming five times more tokens. We propose †DAGGER, which reformulates mathematical problem solving as executable computational graph generation with explicit modeling of distractor nodes. Fine-tuning Gemma-3 models using supervised fine-tuning followed by Group Relative Policy Optimization achieves comparable weighted accuracy on augmented benchmarks while using 89 percent fewer tokens than reasoning models. Importantly, this robustness emerges without explicit training on distractor-augmented examples. Our results suggest that enforcing structured intermediate representations improves robustness and inference efficiency in mathematical reasoning compared to free-form approaches, particularly in noisy, low-resource settings.

Related papers

From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics [79.81905350372067]
We study gap through contextual mathematical reasoning.<n>We introduce ContextMATH, a benchmark that repurposes AIME and MATH-500 problems into two contextual settings.<n>Open-source models decline by 13 and 34 points on SG and CS, while proprietary models drop by 13 and 20.
arXiv Detail & Related papers (2026-01-30T14:56:04Z)
Once Upon an Input: Reasoning via Per-Instance Program Synthesis [19.86168542588911]
We introduce Per-Instance Program Synthesis (PIPS), a method that generates and refines programs at the instance-level using structural feedback.<n>To further improve performance, PIPS incorporates a confidence metric that dynamically chooses between direct inference and program synthesis on a per-instance basis.
arXiv Detail & Related papers (2025-10-26T21:58:33Z)
Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression [68.69801176669843]
We propose an online post-training RL method that prunes redundant steps and estimates difficulty.<n> TRAAC (Think Right with Adaptive, Attentive Compression) achieves an average absolute accuracy gain of 8.4%.<n>Although our models are trained on math datasets, they show accuracy and efficiency gains on out-of-distribution non-math datasets.
arXiv Detail & Related papers (2025-10-02T02:00:20Z)
Heterogeneous Graph Prompt Learning via Adaptive Weight Pruning [37.735384483052044]
Graph Neural Networks (GNNs) have achieved remarkable success in various graph-based tasks (e.g., node classification or link prediction)<n>Despite their triumphs, GNNs still face challenges such as long training and inference times, difficulty in capturing complex relationships, and insufficient feature extraction.<n>We propose a novel framework combining graph prompts with weight pruning, called GPAWP, which aims to enhance the performance and efficiency of graph prompts by using fewer of them.
arXiv Detail & Related papers (2025-07-12T04:12:24Z)
ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [74.37307916314407]
We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning.
arXiv Detail & Related papers (2025-06-23T16:20:44Z)
Quantization Meets Reasoning: Exploring and Mitigating Degradation of Low-Bit LLMs in Mathematical Reasoning [39.56908863102256]
Low-bit post-training quantization impairs mathematical reasoning up to 69.81% in harder settings.<n>We address two deployment-critical questions with process-level precision.<n>In our settings, as few as 332 curated examples and 3--5 minutes of compute on a single GPU recover 4-bit weight math reasoning toward the full-precision baseline.
arXiv Detail & Related papers (2025-05-16T12:11:40Z)
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [64.74765550805024]
Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 84% with minimal accuracy loss across 18 reasoning datasets.
arXiv Detail & Related papers (2025-03-07T06:57:17Z)
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability [53.51560766150442]
Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes.<n>We present a novel framework for identifying these tokens through rollout sampling.<n>We show that identifying and replacing critical tokens significantly improves model accuracy.
arXiv Detail & Related papers (2024-11-29T18:58:22Z)
Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.<n>Existing direct preference learning algorithms are originally designed for the single-turn chat task.<n>We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z)
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results. We develop a method that avoids introducing external resources, relying instead on perturbations to the input. Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z)
Joint Graph Learning and Model Fitting in Laplacian Regularized Stratified Models [5.933030735757292]
Laplacian regularized stratified models (LRSM) are models that utilize the explicit or implicit network structure of the sub-problems. This paper shows the importance and sensitivity of graph weights in LRSM, and provably show that the sensitivity can be arbitrarily large. We propose a generic approach to jointly learn the graph while fitting the model parameters by solving a single optimization problem.
arXiv Detail & Related papers (2023-05-04T06:06:29Z)
Gaussian Graphical Model Selection for Huge Data via Minipatch Learning [1.2891210250935146]
We propose the Minipatch Graph (MPGraph) estimator to solve the problem of graphical model selection. MPGraph is a generalization of thresholded graph estimators fit to tiny, random subsets of both the observations and the nodes. We prove that our algorithm achieves finite-sample graph selection consistency.
arXiv Detail & Related papers (2021-10-22T21:06:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.