Related papers: ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations

ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations

URL: http://arxiv.org/abs/2510.27355v1
Date: Fri, 31 Oct 2025 10:40:19 GMT
Title: ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations
Authors: Zijian Wang, Chang Xu,
Abstract summary: ThoughtProbe is a novel inference framework that leverages the hidden reasoning features of Large Language Models (LLMs) to improve their reasoning performance.<n>We harness these hidden representations as discriminative signals to guide the tree structured response space exploration.<n>Our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.
Score: 22.84446651161078
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces ThoughtProbe, a novel inference time framework that leverages the hidden reasoning features of Large Language Models (LLMs) to improve their reasoning performance. Unlike previous works that manipulate the hidden representations to steer LLM generation, we harness them as discriminative signals to guide the tree structured response space exploration. In each node expansion, a classifier serves as a scoring and ranking mechanism that efficiently allocates computational resources by prioritizing higher score candidates for continuation. After completing the tree expansion, we collect answers from all branches to form a candidate answer pool. We then propose a branch aggregation method that marginalizes over all supporting branches by aggregating their CoT scores, thereby identifying the optimal answer from the pool. Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.

Related papers

LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval [74.72139580745511]
LaSER is a novel self-distillation framework that internalizes explicit reasoning into the latent space of retrievers.<n>Our method successfully combines the reasoning depth of explicit CoT pipelines with the inference efficiency of standard dense retrievers.
arXiv Detail & Related papers (2026-03-02T04:11:18Z)
TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG [71.06073770344732]
Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval.<n>We present TreePS-RAG, an online, tree-based RL framework for agentic RAG that enables step-wise credit assignment while retaining outcome-only rewards.
arXiv Detail & Related papers (2026-01-11T14:07:30Z)
Reinforced Efficient Reasoning via Semantically Diverse Exploration [73.41112984160992]
Reinforcement learning with verifiable rewards (RLVR) has proven effective in enhancing the reasoning of large language models (LLMs)<n>We propose reinforced efficient reasoning via semantically diverse explorations, i.e., ROSE, for LLMs.<n>Our method incorporates a semantic-entropy-based branching strategy and an $varepsilon$-exploration mechanism.
arXiv Detail & Related papers (2026-01-08T15:56:44Z)
A Reasoning Paradigm for Named Entity Recognition [16.86833034216367]
Reasoning framework is proposed for Named Entity Recognition.<n> framework consists of three stages: Chain of Thought (CoT) generation, CoT tuning, and reasoning enhancement.<n>Experiments show ReasoningNER demonstrates impressive cognitive ability in the NER task, achieving competitive performance.
arXiv Detail & Related papers (2025-11-15T01:31:43Z)
In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback [38.915062716409686]
InTRO is a new framework that enables both token-level exploration and self-feedback for accurate and concise reasoning.<n>InTRO consistently outperforms other baselines, raising solution accuracy by up to 20% relative to the base model.<n>Its chains of thought are notably more concise, exhibiting reduced verbosity.
arXiv Detail & Related papers (2025-11-13T01:47:06Z)
Implicit Reasoning in Large Language Models: A Comprehensive Survey [67.53966514728383]
Large Language Models (LLMs) have demonstrated strong generalization across a wide range of tasks.<n>Recent studies have shifted attention from explicit chain-of-thought prompting toward implicit reasoning.<n>This survey introduces a taxonomy centered on execution paradigms, shifting the focus from representational forms to computational strategies.
arXiv Detail & Related papers (2025-09-02T14:16:02Z)
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation [60.18907916989796]
Large Language Models (LLMs) generate chains of thought (CoTs) before giving the final answer.<n>We propose a novel pipeline enriched with linguistically-grounded discourse segmenters to extract supporting and opposing statements for each answer option.<n>We also propose a rank-based HLV evaluation framework that prioritizes the ranking of answers over exact scores.
arXiv Detail & Related papers (2025-05-29T11:47:18Z)
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning [20.082244529609707]
We make the key discovery that a simple linear classifier can effectively detect intrinsic reasoning capabilities in LLMs' activation space.<n>We propose a classifier-guided search framework that strategically explore a tree-structured response space.<n> Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them.
arXiv Detail & Related papers (2025-04-09T07:37:27Z)
Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer [36.898636836302956]
Large language models have shown strong reasoning capabilities through chain-structured methods such as Chain-of-Thought.<n>Recent studies optimize thought structures by generating parallel or tree-like structures, switching between long and short reasoning modes, or aligning reasoning steps with task performance.<n>These approaches mainly rely on previously generated logical directions of the chains, which ignore the unexplored regions of the solution space.<n>We propose the Thought Space Explorer'' (TSE), a framework for navigating and expanding thought structures to overcome blind spots in LLM reasoning.
arXiv Detail & Related papers (2024-10-31T17:12:14Z)
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency. This approach fails in scenarios where the correct answers are in the minority. We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z)
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs) Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy. At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z)
Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner [56.08919422452905]
We propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR) Our model is able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises. We outperform existing benchmarks on premise retrieval and entailment tree generation, with around 300% gain in overall correctness.
arXiv Detail & Related papers (2022-05-18T21:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.