Related papers: Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering

Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering

URL: http://arxiv.org/abs/2601.13752v1
Date: Tue, 20 Jan 2026 09:07:01 GMT
Title: Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering
Authors: Chak Tou Leong, Dingwei Chen, Heming Xia, Qingyu Yin, Sunbowen Lee, Jian Wang, Wenjie Li,
Abstract summary: Large reasoning models (LRMs) have achieved remarkable success in complex problem-solving, yet they often suffer from computational redundancy or reasoning unfaithfulness.<n>We propose Reasoning Belief Engineering (RELIEF), a framework that shapes LRM behavior by aligning the model's self-concept with a target belief blueprint.<n>RELIEF internalizes desired traits by fine-tuning on synthesized, self-reflective question-answering pairs that affirm the target belief.
Score: 25.183793455770978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large reasoning models (LRMs) have achieved remarkable success in complex problem-solving, yet they often suffer from computational redundancy or reasoning unfaithfulness. Current methods for shaping LRM behavior typically rely on reinforcement learning or fine-tuning with gold-standard reasoning traces, a paradigm that is both computationally expensive and difficult to scale. In this paper, we reveal that LRMs possess latent \textit{reasoning beliefs} that internally track their own reasoning traits, which can be captured through simple logit probing. Building upon this insight, we propose Reasoning Belief Engineering (RELIEF), a simple yet effective framework that shapes LRM behavior by aligning the model's self-concept with a target belief blueprint. Crucially, RELIEF completely bypasses the need for reasoning-trace supervision. It internalizes desired traits by fine-tuning on synthesized, self-reflective question-answering pairs that affirm the target belief. Extensive experiments on efficiency and faithfulness tasks demonstrate that RELIEF matches or outperforms behavior-supervised and preference-based baselines while requiring lower training costs. Further analysis validates that shifting a model's reasoning belief effectively shapes its actual behavior.

Related papers

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution [79.98699884805636]
Reasoning Execution by Multiple Listeners (REMUL) is a multi-party reinforcement learning approach.<n>REMUL builds on the hypothesis that reasoning traces which other parties can follow will be more faithful.<n>Speakers are rewarded for producing reasoning that is clear to listeners.
arXiv Detail & Related papers (2026-02-18T02:55:55Z)
Reward Modeling for Reinforcement Learning-Based LLM Reasoning: Design, Challenges, and Evaluation [46.38008143057758]
Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable.<n>This work argues that reward modeling is not merely an implementation detail but a central architect of reasoning alignment.<n>Within this framework, we present a taxonomy of reward mechanisms, analyze reward hacking as a pervasive failure mode, and examine how reward signals unify challenges.
arXiv Detail & Related papers (2026-02-10T00:45:24Z)
Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models [59.6715047267181]
Small reasoning models (SRMs) are prone to hallucinations, especially in intermediate reasoning steps.<n>Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained chain-of-thought evaluation.<n>We propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model.
arXiv Detail & Related papers (2026-02-05T17:15:12Z)
How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns [51.02752099869218]
Large Language Models (LLMs) display strikingly different generalization behaviors.<n>We introduce a novel benchmark that decomposes reasoning into atomic core skills.<n>We show that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns.
arXiv Detail & Related papers (2025-12-30T08:16:20Z)
Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement [54.63337314382886]
We introduce self-rewriting framework, where a model rewrites its own reasoning texts, and subsequently learns from the rewritten reasoning to improve internal thought process quality.<n>For algorithm design, we propose a selective rewriting approach wherein only "simple" samples, defined by the model's consistent correctness, are rewritten.<n>Experiments on diverse tasks with different model sizes validate the effectiveness of self-rewriting.
arXiv Detail & Related papers (2025-11-20T13:10:52Z)
Consistency Is Not Always Correct: Towards Understanding the Role of Exploration in Post-Training Reasoning [75.79451512757844]
Foundation models exhibit broad knowledge but limited task-specific reasoning.<n> RLVR and inference scaling motivate post-training strategies such as RLVR and inference scaling.<n>We show that RLVR induces a squeezing effect, reducing reasoning entropy and forgetting some correct paths.
arXiv Detail & Related papers (2025-11-10T18:25:26Z)
MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models [43.872922223495586]
Large reasoning models (LRMs) show strong capabilities in complex reasoning, yet their marginal gains on evidence-dependent factual questions are limited.<n>We find this limitation is partially attributable to a reasoning-answer hit gap, where the model identifies the correct facts during reasoning but fails to incorporate them into the final response.<n>We propose MR-ALIGN, a framework that enhances factuality without relying on external verifiers.
arXiv Detail & Related papers (2025-10-27T15:00:54Z)
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization [62.07990937720985]
Dimension-level Reward Model (DRM) is a new supervision framework for Large Language Models.<n>DRM evaluates the quality of a reasoning process along three fundamental, complementary, and interpretable dimensions.<n> Experimental results show that DRM provides effective supervision signals, guides the optimization of LLMs and enhances their reasoning ability.
arXiv Detail & Related papers (2025-10-13T14:29:15Z)
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models [15.797612515648412]
Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning.<n>Recent studies reveal that their final answers often contradict their own reasoning traces.<n>We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval.<n>We introduce FARL, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning.
arXiv Detail & Related papers (2025-09-29T01:13:33Z)
Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs [8.359909829007005]
We investigate whether efficient reasoning strategies introduce behavioral inconsistencies in large reasoning models (LRMs)<n>$ICBENCH$ is a benchmark designed to measure inconsistency in LRMs across three dimensions.<n>We find that while larger models generally exhibit greater consistency than smaller ones, they all display widespread "scheming" behaviors.
arXiv Detail & Related papers (2025-06-24T10:25:28Z)
Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z)
Thinking Out Loud: Do Reasoning Models Know When They're Right? [25.683190106676893]
Large reasoning models (LRMs) have recently demonstrated impressive capabilities in complex reasoning tasks.<n>We investigate how LRMs interact with other model behaviors by analyzing verbalized confidence.<n>We find that reasoning models may possess a diminished awareness of their own knowledge boundaries.
arXiv Detail & Related papers (2025-04-09T03:58:19Z)
Reward Models Identify Consistency, Not Causality [54.987590763737145]
State-of-the-art reward models prioritize structural consistency over causal correctness.<n>Removing the problem statement has minimal impact on reward scores.<n> altering numerical values or disrupting the reasoning flow significantly affects RM outputs.
arXiv Detail & Related papers (2025-02-20T14:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.