Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction
- URL: http://arxiv.org/abs/2511.07943v2
- Date: Fri, 14 Nov 2025 05:24:15 GMT
- Title: Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction
- Authors: Jun Xu, Xinkai Du, Yu Ao, Peilong Zhao, Yang Li, Ling Zhong, Lin Yuan, Zhongpu Bo, Xiaorui Wang, Mengshu Sun, Zhengke Gui, Dalong Zhang, Zhaoyang Wang, Qiwei Wang, Yangyang Hou, Zhiying Yin, Haofen Wang, Huajun Chen, Lei Liang, Jun Zhou,
- Abstract summary: Thinker is a hierarchical thinking model for deep search through multi-turn interaction.<n>It decomposes complex problems into independently solvable sub-problems.<n> dependencies between sub-problems are passed as parameters via these logical functions.
- Score: 57.67217258741752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes. The source code is available at https://github.com/OpenSPG/KAG-Thinker.
Related papers
- KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation [35.555200530999365]
We introduce KAG-Thinker, which upgrade KAG to a multi-turn interactive thinking and deep reasoning framework powered by a dedicated parameter-light large language model (LLM)<n>Our approach constructs a structured thinking process for solving complex problems, enhancing the the logical coherence and contextual consistency of the reasoning process.
arXiv Detail & Related papers (2025-06-21T14:58:53Z) - LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning [29.047063129464494]
Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning unfamiliar settings.<n>This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specifics rather than fostering general-purpose thinking strategies.<n>We propose a "play to learn" framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles.
arXiv Detail & Related papers (2025-06-05T09:40:47Z) - Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z) - Thinkless: LLM Learns When to Think [57.857534644932194]
Reasoning Language Models, capable of extended chain-of-thought reasoning, have demonstrated remarkable performance on tasks requiring complex logical inference.<n>We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning.<n>On several benchmarks such as Minerva Algebra, MATH-500, and GSM8K, Thinkless is able to reduce the usage of long-chain thinking by 50% - 90%.
arXiv Detail & Related papers (2025-05-19T17:24:16Z) - ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving [4.987786842464663]
Tree-of-thoughts (ToT) offers a conceptually more advanced approach by modeling reasoning as an exploration within a tree structure.<n>ToTRL is designed to guide LLMs in developing the parallel ToT strategy based on the sequential CoT strategy.<n>Our ToTQwen3-8B model, trained with ToTRL, achieves significant improvement in performance and reasoning efficiency on complex reasoning tasks.
arXiv Detail & Related papers (2025-05-19T05:18:58Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - Supervised Chain of Thought [5.389461633686935]
Chain of Thought (CoT) prompting offers a promising approach to solving complex reasoning tasks.
One-prompt-for-all approach poses significant challenges for models to generate the correct reasoning steps.
We show how task-specific supervision is essential for navigating the prompt space accurately and achieving optimal performance.
arXiv Detail & Related papers (2024-10-18T06:25:27Z) - FLARE: Faithful Logic-Aided Reasoning and Exploration [47.46564769245296]
We introduce a novel approach for traversing the problem space using task decompositions.<n>We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.<n>Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Multi-Step Reasoning with Large Language Models, a Survey [8.647697652065718]
This article reviews the field of multi-step reasoning with large language models (LLMs)<n>We propose a taxonomy that identifies different ways to generate, evaluate, and control multi-step reasoning.<n>We find that multi-step reasoning approaches have progressed beyond math word problems, and can now successfully solve challenges in logic, games, and robotics.
arXiv Detail & Related papers (2024-07-16T08:49:35Z) - LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning [61.7853049843921]
Chain-of-thought (CoT) prompting is a popular in-context learning approach for large language models (LLMs)<n>This paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales.
arXiv Detail & Related papers (2023-12-07T20:36:10Z) - Search-in-the-Chain: Interactively Enhancing Large Language Models with
Search for Knowledge-intensive Tasks [121.74957524305283]
This paper proposes a novel framework named textbfSearch-in-the-Chain (SearChain) for the interaction between Information Retrieval (IR) and Large Language Model (LLM)
Experiments show that SearChain outperforms state-of-the-art baselines on complex knowledge-intensive tasks.
arXiv Detail & Related papers (2023-04-28T10:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.