Entropy-based Exploration Conduction for Multi-step Reasoning
- URL: http://arxiv.org/abs/2503.15848v1
- Date: Thu, 20 Mar 2025 05:03:26 GMT
- Title: Entropy-based Exploration Conduction for Multi-step Reasoning
- Authors: Jinghan Zhang, Xiting Wang, Fengran Mo, Yeyang Zhou, Wanfu Gao, Kunpeng Liu,
- Abstract summary: In large language model (LLM) reasoning, multi-step processes have proven effective for solving complex tasks.<n>Existing methods to automatically decide the depth often bring high costs and lack flexibility.<n>We propose Entropy-based Exploration Depth Conduction (Entro-duction) to dynamically adjust the exploration depth.
- Score: 15.589134593402589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In large language model (LLM) reasoning, multi-step processes have proven effective for solving complex tasks. However, the depth of exploration can significantly affect the reasoning performance. Existing methods to automatically decide the depth often bring high costs and lack flexibility, and thus undermine the model's reasoning accuracy. To address these issues, we propose Entropy-based Exploration Depth Conduction (Entro-duction), a novel method that dynamically adjusts the exploration depth during multi-step reasoning by monitoring LLM's output entropy and variance entropy. We employ these two metrics to capture the model's current uncertainty and the fluctuation of uncertainty across consecutive reasoning steps. Based on the observed changes, the LLM selects whether to deepen, expand or stop exploration according to the probability. In this way, we balance the reasoning accuracy and exploration effectiveness. Experimental results across four benchmark datasets demonstrate the efficacy of Entro-duction. We further conduct experiments and analysis on the components of Entro-duction to discuss their contributions to reasoning performance.
Related papers
- Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment [54.62926010621013]
We introduce a novel task, code reasoning, to provide a new perspective for the reasoning abilities of large language models.<n>We summarize three meta-benchmarks based on established forms of logical reasoning, and instantiate these into eight specific benchmark tasks.<n>We present a new pathway exploration pipeline inspired by human intricate problem-solving methods.
arXiv Detail & Related papers (2025-02-17T10:39:58Z) - Is Depth All You Need? An Exploration of Iterative Reasoning in LLMs [11.896234713853298]
We investigate whether relevant knowledge that contributes directly to solving the given question can be activated from the initial reasoning path.<n>Our experiments reveal that increasing the diversity of initial reasoning paths can achieve comparable or superior performance.<n>We propose a simple yet effective method that enhances reasoning breadth by integrating contextual exploration with reduced sampling randomness.
arXiv Detail & Related papers (2025-02-15T16:59:59Z) - The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective [18.389232051345825]
In policy optimization, excessive reliance on exploration reduces learning efficiency, while over-dependence on exploitation might trap agents in local optima.
This paper revisits the exploration-exploitation dilemma from the perspective of entropy.
We establish an end-to-end adaptive framework called AdaZero, which automatically determines whether to explore or to exploit.
arXiv Detail & Related papers (2024-08-19T13:21:46Z) - Uncertainty Quantification for Forward and Inverse Problems of PDEs via
Latent Global Evolution [110.99891169486366]
We propose a method that integrates efficient and precise uncertainty quantification into a deep learning-based surrogate model.
Our method endows deep learning-based surrogate models with robust and efficient uncertainty quantification capabilities for both forward and inverse problems.
Our method excels at propagating uncertainty over extended auto-regressive rollouts, making it suitable for scenarios involving long-term predictions.
arXiv Detail & Related papers (2024-02-13T11:22:59Z) - DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy [76.58614128865652]
We propose DetermLR, a novel perspective that rethinks the reasoning process as an evolution from indeterminacy to determinacy.
First, we categorize known conditions into two types: determinate and indeterminate premises This provides an oveall direction for the reasoning process and guides LLMs in converting indeterminate data into progressively determinate insights.
We automate the storage and extraction of available premises and reasoning paths with reasoning memory, preserving historical reasoning details for subsequent reasoning steps.
arXiv Detail & Related papers (2023-10-28T10:05:51Z) - Counterfactual Learning with Multioutput Deep Kernels [0.0]
In this paper, we address the challenge of performing counterfactual inference with observational data.
We present a general class of counterfactual multi-task deep kernels models that estimate causal effects and learn policies proficiently.
arXiv Detail & Related papers (2022-11-20T23:28:41Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - ADER:Adapting between Exploration and Robustness for Actor-Critic
Methods [8.750251598581102]
We show that TD3's performance lags behind the vanilla actor-critic methods in some primitive environments.
We propose a novel algorithm toward this problem that ADapts between Exploration and Robustness, namely ADER.
Experiments in several challenging environments demonstrate the supremacy of the proposed method in continuous control tasks.
arXiv Detail & Related papers (2021-09-08T05:48:39Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.