Related papers: GeoSteer: Faithful Chain-of-Thought Steering via Latent Manifold Gradients

GeoSteer: Faithful Chain-of-Thought Steering via Latent Manifold Gradients

URL: http://arxiv.org/abs/2601.10229v2
Date: Tue, 20 Jan 2026 05:37:44 GMT
Title: GeoSteer: Faithful Chain-of-Thought Steering via Latent Manifold Gradients
Authors: Kentaro Kazama, Daiki Shirafuji, Tatsuhiko Saito,
Abstract summary: We propose GeoSteer, a manifold-based framework that improves the quality of intermediate reasoning.<n>The method logically consists of: (1) constructing a CoT dataset with step-level scores, (2) training a Variational Autoencoder (VAE) model and a quality estimation model to learn a low-dimensional manifold of high-quality CoT trajectories, and (3) steering hidden states of target LLMs toward higher-quality regions in the latent space.
Score: 1.8033500402815792
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advances in Large Language Models (LLMs) have demonstrated remarkable progress in their reasoning capabilities, such as Chain-of-Thought (CoT). Most approaches rely on CoT rationales. Previous studies have shown that LLMs often generate logically inconsistent reasoning steps even when their final answers are correct. These inconsistencies reduce the reliability of the reasoning process. We propose GeoSteer, a manifold-based framework that improves the quality of intermediate reasoning. The method consists of: (1) constructing a CoT dataset with step-level scores, (2) training a Variational Autoencoder (VAE) model and a quality estimation model to learn a low-dimensional manifold of high-quality CoT trajectories, and (3) steering hidden states of target LLMs toward higher-quality regions in the latent space. This last step enables steering of the hidden states by following gradients along the learned manifold. It facilitates geometrically coherent steering. Evaluation experiments were conducted on the GSM8k dataset using the Qwen3 series. We evaluated performance using two metrics: answer accuracy and overall reasoning quality. GeoSteer improved the accuracy by 0.9 points and enhanced the reasoning quality by 4.5 points on average, compared with those of original LLMs. These results indicate that GeoSteer improves an effective and controllable mechanism for improving the quality of intermediate reasoning in LLMs.

Related papers

S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs [48.80914119283909]
Large language models equipped with chain-of-thought (CoT) achieve strong performance and offer a window into behavior.<n>Recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes.<n>Our study presents a self-sampling framework based on activation steering for efficient CoT learning.
arXiv Detail & Related papers (2026-02-02T11:37:36Z)
Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward [67.00373428443879]
We introduce a paradigm shift towards subgoal-level evaluation and learning.<n>We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine.<n>We propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces sparse signals with dense rewards based on the Skeleton Rate.
arXiv Detail & Related papers (2026-01-08T16:17:56Z)
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning [44.07085022671951]
Trajectories that introduce novel gradient directions receive a bounded multiplicative reward scaler.<n>G2RL consistently improves pass@1, maj@16, and pass@k over entropy based GRPO and external embedding methods.
arXiv Detail & Related papers (2025-12-17T18:44:45Z)
What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation [67.47463575774388]
We decompose reasoning quality into two dimensions: relevance and coherence.<n>To measure these aspects reliably, we introduce causal stepwise evaluation (CaSE)<n>We show that curating training data with CaSE-evaluated relevance and coherence directly improves final task performance.
arXiv Detail & Related papers (2025-10-23T14:30:37Z)
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding [59.60915947702282]
Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in enhancing the reasoning capabilities of large language models (LLMs)<n>Existing RLVR methods often suffer from exploration inefficiency due to mismatches between the training data's difficulty and the model's capability.<n>We propose SEELE, a novel supervision-aided RLVR framework that dynamically adjusts problem difficulty to stay within the high-efficiency region.
arXiv Detail & Related papers (2025-09-08T17:36:21Z)
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs [55.20230501807337]
We present the first systematic evaluation of 5 methods for improving prompt robustness within a unified experimental framework.<n>We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural Instructions dataset.
arXiv Detail & Related papers (2025-08-15T10:32:50Z)
GeoSR: Cognitive-Agentic Framework for Probing Geospatial Knowledge Boundaries via Iterative Self-Refinement [4.026524042818433]
GeoSR is a self-refining agentic reasoning framework that embeds core geographic principles into an iterative prediction loop.<n>We validate GeoSR on tasks ranging from physical-world property estimation to socioeconomic prediction.
arXiv Detail & Related papers (2025-08-06T04:45:34Z)
Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning [81.50681925980135]
We propose Stepwise Reasoning Checkpoint Analysis (SRCA), a framework that introduces checkpoints between reasoning steps.<n>It incorporates two key strategies: (1) Answer-Clustered Search, which groups reasoning paths by their intermediate checkpoint answers to maintain diversity while ensuring quality, and (2) Checkpoint Candidate Augmentation, which leverages all intermediate answers for final decision-making.<n>Our approach effectively reduces path homogenization and creates a fault-tolerant mechanism by utilizing high-quality intermediate results.
arXiv Detail & Related papers (2025-05-23T12:42:50Z)
TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving [106.04001249574786]
TrustGeoGen is a data engine that generates formally verified geometric problems to establish a principled and trustworthy benchmark.<n>Our engine integrates four key innovations: 1) Multimodal Alignment, which synchronizes the generation of diagrams, text, and step-by-step solutions; 2) Formal Verification, ensuring all reasoning paths are rule-compliant; 3) Connection Thinking, bridging formal deduction with human-like logical steps; and 4) our textitGeoExplore series algorithms, which produce diverse problem variants with multiple solutions and self-reflective backtracking.
arXiv Detail & Related papers (2025-04-22T10:45:23Z)
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning [20.562109430526007]
Chain-of-Thought (CoT) reasoning has proven effective in natural language tasks but remains underexplored in multimodal alignment.<n>This study investigates its integration into 3D vision-hugging learning by embedding structured reasoning into alignment training.
arXiv Detail & Related papers (2025-03-08T14:24:54Z)
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback [94.25162866972077]
Step-KTO is a training framework that combines process-level and outcome-level binary feedback.<n>Our experiments show that Step-KTO significantly improves both final answer accuracy and the quality of intermediate reasoning steps.
arXiv Detail & Related papers (2025-01-18T15:38:03Z)
Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs [63.36637269634553]
We introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step.<n>We show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales.<n>Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
Neural Gradient Learning and Optimization for Oriented Point Normal Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z)
G3Reg: Pyramid Graph-based Global Registration using Gaussian Ellipsoid Model [21.189016878269104]
This study introduces a novel framework, G3Reg, for fast and robust global registration of LiDAR point clouds. In contrast to conventional complex keypoints and descriptors, we extract fundamental geometric primitives. We present a distrust-and-verify scheme based on a Pyramid Graph for Global Registration.
arXiv Detail & Related papers (2023-08-22T17:23:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.