Related papers: AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models

URL: http://arxiv.org/abs/2505.17312v3
Date: Fri, 27 Jun 2025 19:19:38 GMT
Title: AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models
Authors: Xiangqi Wang, Yue Huang, Yanbo Wang, Xiaonan Luo, Kehan Guo, Yujun Zhou, Xiangliang Zhang,
Abstract summary: AdaReasoner is an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations.<n>AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy.<n>It consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.
Score: 32.51746551988431
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work 'well enough' across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.

Related papers

RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow [19.502882116487005]
Remote sensing imagery presents vast, inherently unstructured spatial data.<n>We propose RemoteReasoner, a flexible and robust workflow for remote sensing reasoning tasks.<n>Preliminary experiments demonstrated that RemoteReasoner achieves remarkable performance across multi-granularity reasoning tasks.
arXiv Detail & Related papers (2025-07-25T13:58:11Z)
Preference-based Multi-Objective Reinforcement Learning [5.031225669460861]
This paper introduces preference-based MORL (Pb-MORL), which formalizes the integration of preferences into the MORL framework.<n>To guide policy optimization using preferences, our method constructs a multi-objective reward model that aligns with the given preferences.<n>Experiments in benchmark multi-objective tasks, a multi-energy management task, and an autonomous driving task on a multi-line highway show that our method performs competitively.
arXiv Detail & Related papers (2025-07-18T16:43:04Z)
LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning [29.047063129464494]
Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning unfamiliar settings.<n>This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specifics rather than fostering general-purpose thinking strategies.<n>We propose a "play to learn" framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles.
arXiv Detail & Related papers (2025-06-05T09:40:47Z)
PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z)
Guiding Reasoning in Small Language Models with LLM Assistance [23.3038074903744]
Small Language Models cast doubt suitability for tasks demanding deep, multi-step logical deduction.<n>This paper introduces a framework called Small Reasons, Large Hints, which selectively augments SLM reasoning with targeted guidance from large language models.<n>Our experiments on mathematical reasoning datasets demonstrate that targeted external scaffolding significantly improves performance.
arXiv Detail & Related papers (2025-04-14T06:32:45Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Towards more Contextual Agents: An extractor-Generator Optimization Framework [0.0]
Large Language Model (LLM)-based agents have demonstrated remarkable success in solving complex tasks across a wide range of general-purpose applications.<n>However, their performance often degrades in context-specific scenarios, such as specialized industries or research domains.<n>To address this challenge, our work introduces a systematic approach to enhance the contextual adaptability of LLM-based agents.
arXiv Detail & Related papers (2025-02-18T15:07:06Z)
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving [55.895917967408586]
Existing approaches to mathematical reasoning with large language models rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation.<n>We propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously.
arXiv Detail & Related papers (2025-02-17T16:56:23Z)
Offline Reinforcement Learning for LLM Multi-Step Reasoning [15.687002884103537]
OREO (Offline Reasoning Optimization) is an offline reinforcement learning method for enhancing multi-step reasoning.<n>It reduces the need to collect pairwise data and enables better credit assignment.<n>It surpasses existing offline learning methods on multi-step reasoning benchmarks.
arXiv Detail & Related papers (2024-12-20T18:49:45Z)
Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees [3.4289478404209826]
Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection.<n>We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-confidence predictions.
arXiv Detail & Related papers (2024-10-21T08:21:00Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
Towards Generalist Prompting for Large Language Models by Mental Models [105.03747314550591]
Large language models (LLMs) have demonstrated impressive performance on many tasks. To achieve optimal performance, specially designed prompting methods are still needed. We introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance.
arXiv Detail & Related papers (2024-02-28T11:29:09Z)
LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning [61.7853049843921]
Chain-of-thought (CoT) prompting is a popular in-context learning approach for large language models (LLMs) This paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales.
arXiv Detail & Related papers (2023-12-07T20:36:10Z)
A Principled Framework for Knowledge-enhanced Large Language Model [58.1536118111993]
Large Language Models (LLMs) are versatile, yet they often falter in tasks requiring deep and reliable reasoning. This paper introduces a rigorously designed framework for creating LLMs that effectively anchor knowledge and employ a closed-loop reasoning process.
arXiv Detail & Related papers (2023-11-18T18:10:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.