Related papers: Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making

Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making

URL: http://arxiv.org/abs/2512.08366v1
Date: Tue, 09 Dec 2025 08:44:59 GMT
Title: Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making
Authors: Wentao Zhang, Qunbo Wang, Tao Zhang, Junsheng Wu, Hongping Gan, Yang Liu, Ling Dai, Shizhuang Deng, Shuntong Sun,
Abstract summary: Large language model (LLM) agents often rely on external demonstrations or retrieval-augmented planning.<n>We propose DuSAR - a demonstration-free framework that enables a single frozen LLM to perform co-adaptive reasoning.<n>On ALFWorld and Mind2Web, DuSAR achieves state-of-the-art performance with open-source LLMs.
Score: 24.534365665776672
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) agents often rely on external demonstrations or retrieval-augmented planning, leading to brittleness, poor generalization, and high computational overhead. Inspired by human problem-solving, we propose DuSAR (Dual-Strategy Agent with Reflecting) - a demonstration-free framework that enables a single frozen LLM to perform co-adaptive reasoning via two complementary strategies: a high-level holistic plan and a context-grounded local policy. These strategies interact through a lightweight reflection mechanism, where the agent continuously assesses progress via a Strategy Fitness Score and dynamically revises its global plan when stuck or refines it upon meaningful advancement, mimicking human metacognitive behavior. On ALFWorld and Mind2Web, DuSAR achieves state-of-the-art performance with open-source LLMs (7B-70B), reaching 37.1% success on ALFWorld (Llama3.1-70B) - more than doubling the best prior result (13.0%) - and 4.02% on Mind2Web, also more than doubling the strongest baseline. Remarkably, it reduces per-step token consumption by 3-9X while maintaining strong performance. Ablation studies confirm the necessity of dual-strategy coordination. Moreover, optional integration of expert demonstrations further boosts results, highlighting DuSAR's flexibility and compatibility with external knowledge.

Related papers

Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents [49.119608399413806]
Large language models (LLMs) are increasingly deployed as autonomous agents for multi-turn decision-making tasks.<n>This paper introduces Cog, a framework that trains agents to dynamically adapt cognitive depth at each step.<n> Experiments on ALFWorld and ScienceWorld demonstrate that Cog achieves state-of-the-art performance with superior efficiency.
arXiv Detail & Related papers (2026-02-13T06:52:09Z)
Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization [51.12422886183246]
Large Language Models (LLMs) have developed rapidly in web services, delivering unprecedented capabilities while amplifying societal risks.<n>Existing works tend to focus on either isolated jailbreak attacks or static defenses, neglecting the dynamic interplay between evolving threats and safeguards in real-world web contexts.<n>We propose ACE-Safety, a novel framework that jointly optimize attack and defense models by seamlessly integrating two key innovative procedures.
arXiv Detail & Related papers (2025-11-24T15:23:41Z)
ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization in LLM Agents [0.0]
We introduce ReflexGrad, a novel architecture that tightly couples three complementary mechanisms.<n>Our system achieves true zero-shot generalization through pure semantic reasoning.<n>Our work demonstrates that synergistic integration of complementary learning mechanisms enables robust zero-shot generalization.
arXiv Detail & Related papers (2025-11-18T15:25:05Z)
Graph-Enhanced Policy Optimization in LLM Agent Training [3.177432419321498]
Group based reinforcement learning (RL) has shown impressive results on complex reasoning and mathematical tasks.<n>Group based reinforcement learning (RL) has shown impressive results on complex reasoning and mathematical tasks.
arXiv Detail & Related papers (2025-10-30T08:53:41Z)
Reinforced Strategy Optimization for Conversational Recommender Systems via Network-of-Experts [63.412646471177645]
We propose a novel Reinforced Strategy Optimization (RSO) method for Conversational Recommender Systems (CRSs)<n>RSO decomposes the process of generating strategy-driven response decisions into the macro-level strategy planning and micro-level strategy adaptation.<n>Experiments show that RSO significantly improves interaction performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-30T11:12:01Z)
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning [56.496001894673235]
Reinforcement Learning (RL) has proven highly effective at enhancing the complex reasoning abilities of Large Language Models (LLMs)<n>Our analysis reveals that puzzling phenomena like aha moments", length-scaling'' and entropy dynamics are not disparate occurrences but hallmarks of an emergent reasoning hierarchy.
arXiv Detail & Related papers (2025-09-03T18:52:49Z)
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents [58.174206358223415]
Self-Evolving Embodied Agents-R1, or SEEA-R1, is the first reinforcement fine-tuning framework designed for self-evolving embodied agents.<n>We show that SEEA-R1 can support autonomous adaptation and reward-driven self-evolution.
arXiv Detail & Related papers (2025-06-26T18:00:07Z)
OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections [0.8123746895372843]
We introduce OmniReflect, a reflection-driven framework to improve Large Language Model (LLM) agent performance on complex tasks.<n>We employ Neural, Reflex, and NeuroSymbolic techniques, offering a balance between contextual adaptability and computational efficiency.<n> Empirical results averaged across models show major improvements in task success, with absolute gains of +10.3% on ALFWorld, +23.8% on BabyAI, and +8.3% on PDDL.
arXiv Detail & Related papers (2025-06-20T19:38:21Z)
LLM Agents for Bargaining with Utility-based Feedback [23.357706450282002]
We introduce a comprehensive framework centered on utility-based feedback.<n>Our contributions are threefold: (1) BargainArena, a novel benchmark dataset; (2) human-aligned, economically-grounded evaluation metrics inspired by utility theory; and (3) a structured feedback mechanism enabling LLMs to iteratively refine their bargaining strategies.
arXiv Detail & Related papers (2025-05-29T02:07:27Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Retrieval-Augmented Hierarchical in-Context Reinforcement Learning and Hindsight Modular Reflections for Task Planning with LLMs [8.55917897789612]
We propose Retrieval-Augmented in-context reinforcement Learning (RAHL) for large language models. RAHL decomposes complex tasks into sub-tasks using an LLM-based high-level policy. We show that RAHL can achieve an improvement in performance in 9%, 42%, and 10% in 5 episodes of execution in strong baselines.
arXiv Detail & Related papers (2024-08-12T22:40:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.