Related papers: Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward

Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward

URL: http://arxiv.org/abs/2601.05073v1
Date: Thu, 08 Jan 2026 16:17:56 GMT
Title: Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward
Authors: Jianlong Chen, Daocheng Fu, Shengze Xu, Jiawei Chen, Yuan Feng, Yue Yang, Junchi Yan, Hongyuan Zha, Renqiu Xia,
Abstract summary: We introduce a paradigm shift towards subgoal-level evaluation and learning.<n>We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine.<n>We propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces sparse signals with dense rewards based on the Skeleton Rate.
Score: 67.00373428443879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal Large Language Models (MLLMs) struggle with complex geometric reasoning, largely because "black box" outcome-based supervision fails to distinguish between lucky guesses and rigorous deduction. To address this, we introduce a paradigm shift towards subgoal-level evaluation and learning. We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine, which converts abstract proofs into verifiable numeric subgoals. This structure reveals a critical divergence between reasoning quality and outcome accuracy. Leveraging this, we propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces sparse signals with dense rewards based on the Skeleton Rate. Experiments demonstrate that SGVR not only enhances geometric performance (+9.7%) but also exhibits strong generalization, transferring gains to general math (+8.0%) and other general reasoning tasks (+2.8%), demonstrating broad applicability across diverse domains.

Related papers

Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models [15.849480549367684]
We propose DAGVul, a novel framework that models vulnerability reasoning as a Directed Acyclic Graph (DAG) generation task.<n>By further introducing Reinforcement Learning with Verifiable Rewards (RLVR), we align model reasoning trace with program-intrinsic logic.<n>Our framework improves the reasoning F1-score by an average of 18.9% over all the baselines.
arXiv Detail & Related papers (2026-02-06T13:19:45Z)
How Does Prefix Matter in Reasoning Model Tuning? [57.69882799751655]
We fine-tune three R1 series models across three core model capabilities: reasoning (mathematics), coding, safety, and factuality.<n>Results show that prefix-conditioned SFT improves both safety and reasoning performance, yielding up to +6% higher Safe@1 accuracy.
arXiv Detail & Related papers (2026-01-04T18:04:23Z)
Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models [7.230514235208748]
We propose a novel reasoning framework called Multi-chain Graph Refinement & Selection (MGRS)<n>MGRS significantly advances both the reasoning capability and computational efficiency of reasoning enhancement methods.<n>On the 24-point game, MGRS attains 100% accuracy for the first time, while delivering a 13.6x speed-up compared to the leading Forest of Thoughts framework.
arXiv Detail & Related papers (2025-11-28T12:35:16Z)
EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models [32.041688648831794]
We introduce EffiReason-Bench, a unified benchmark for rigorous cross-paradigm evaluation of efficient reasoning methods.<n>To enable step-by-step evaluation, we construct verified CoT annotations for CommonsenseQA and LogiQA.<n>We propose the E3-Score, a principled metric inspired by economic trade-off modeling that provides smooth, stable evaluation without discontinuities.
arXiv Detail & Related papers (2025-11-13T11:14:46Z)
Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling [82.52485740425321]
Adrial attacks present a critical challenge to deep neural networks' robustness.<n> transferability of adversarial attacks faces a dilemma between Exploitation (maximizing attack potency) and Exploration (enhancing cross-model generalization)
arXiv Detail & Related papers (2025-11-01T05:43:47Z)
Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z)
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z)
Reinforcing Video Reasoning with Focused Thinking [65.85683941058916]
We propose TW-GRPO, a novel framework that enhances visual reasoning with focused thinking and dense reward granularity.<n>Specifically, we employ a token weighting mechanism that prioritizes tokens with high informational density.<n>We also reformulate RL training by shifting from single-choice to multi-choice QA tasks.
arXiv Detail & Related papers (2025-05-30T15:42:19Z)
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation [23.592137999309546]
NeSyGeo is a novel neuro-symbolic framework for generating geometric reasoning data.<n>We release a new benchmark NeSyGeo-Test for evaluating geometric reasoning abilities in MLLMs.
arXiv Detail & Related papers (2025-05-21T16:45:49Z)
ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation [106.04777600352743]
Differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory. The single-path DARTS comes in, which only chooses a single-path submodel at each step. While being memory-friendly, it also comes with low computational costs. We propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure.
arXiv Detail & Related papers (2020-11-23T06:34:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.