ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization in LLM Agents
- URL: http://arxiv.org/abs/2511.14584v1
- Date: Tue, 18 Nov 2025 15:25:05 GMT
- Title: ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization in LLM Agents
- Authors: Ankush Kadu, Ashwanth Krishnan,
- Abstract summary: We introduce ReflexGrad, a novel architecture that tightly couples three complementary mechanisms.<n>Our system achieves true zero-shot generalization through pure semantic reasoning.<n>Our work demonstrates that synergistic integration of complementary learning mechanisms enables robust zero-shot generalization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Enabling agents to learn from experience and generalize across diverse tasks without task-specific training remains a fundamental challenge in reinforcement learning and decision-making. While recent approaches have explored episodic memory (Reflexion), gradient-based prompt optimization (TextGrad),and hierarchical task decomposition independently, their potential for synergistic integration remains unexplored. We introduce ReflexGrad, a novel architecture that tightly couples three complementary mechanisms: (1) LLM-based hierarchical TODO decomposition for strategic planning, (2) history-aware causal reflection that analyzes recent action patterns to identify failure root causes and enable within-trial learning, and (3) gradient-based optimization for systematic improvement. Unlike prior work relying on few-shot demonstrations, our system achieves true zero-shot generalization through pure LLM semantic reasoning,requiring no task-specific examples, fine-tuning, or hardcoded similarity metrics. Evaluated on ALFWorld benchmark tasks, ReflexGrad demonstrates 67% zero-shot success rate on Trial 0 without any prior task experience or demonstrations, establishing effective performance on first exposure. Through empirical analysis, we identify the architectural mechanisms underlying stable convergence (zero action loops) and effective cross-task transfer (67% to 78% improvement).Our work demonstrates that synergistic integration of complementary learning mechanisms enables robust zero-shot generalization that approaches few-shot baselines from prior work.
Related papers
- ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation [54.071574153853994]
ProRAG is a process-supervised reinforcement learning framework designed to integrate learned step-level supervision into the online optimization loop.<n>Our framework consists of four stages: (1) Supervised Policy Warmup to initialize the model with a structured reasoning format; (2) construction of an MCTS-based Process Reward Model (PRM) to quantify intermediate reasoning quality; (3) PRM-Guided Reasoning Refinement to align the policy with fine-grained process preferences; and (4) Process-Supervised Reinforcement Learning with a dual-granularity advantage mechanism.
arXiv Detail & Related papers (2026-01-29T16:04:59Z) - Implicit Neural Representation-Based Continuous Single Image Super Resolution: An Empirical Study [50.15623093332659]
Implicit neural representation (INR) has become the standard approach for arbitrary-scale image super-resolution (ASSR)<n>We compare existing techniques across diverse settings and present aggregated performance results on multiple image quality metrics.<n>We examine a new loss function that penalizes intensity variations while preserving edges, textures, and finer details during training.
arXiv Detail & Related papers (2026-01-25T07:09:20Z) - Integrating Diverse Assignment Strategies into DETRs [61.61489761918158]
Label assignment is a critical component in object detectors, particularly within DETR-style frameworks.<n>We propose LoRA-DETR, a flexible and lightweight framework that seamlessly integrates diverse assignment strategies into any DETR-style detector.
arXiv Detail & Related papers (2026-01-14T07:28:54Z) - Rethinking the Role of Dynamic Sparse Training for Scalable Deep Reinforcement Learning [58.533203990515034]
Scaling neural networks has driven breakthrough advances in machine learning, yet this paradigm fails in deep reinforcement learning (DRL)<n>We show that dynamic sparse training strategies provide module-specific benefits that complement the primary scalability foundation established by architectural improvements.<n>We finally distill these insights into Module-Specific Training (MST), a practical framework that exploits the benefits of architectural improvements and demonstrates substantial scalability gains across diverse RL algorithms without algorithmic modifications.
arXiv Detail & Related papers (2025-10-14T03:03:08Z) - SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection [14.40651157974557]
SAMULE is a new framework for self-learning agents powered by a retrospective language model that is trained based on Multi-Level Reflection Synthesis.<n>It first synthesizes high-quality reflections across three complementary levels: Single-Trajectory Learning (micro-level) for detailed error correction; Intra-Task Learning (meso-level) to build error across multiple trials of the same task, and Inter-Task Learning (macro-level) to extract transferable insights based on same typed errors from diverse task failures.
arXiv Detail & Related papers (2025-09-24T21:02:15Z) - RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents [43.806220882212386]
RLVMR integrates dense, process-level supervision into end-to-end RL by rewarding verifiable, meta-reasoning behaviors.<n>On the challenging ALFWorld and ScienceWorld benchmarks, RLVMR achieves new state-of-the-art results.
arXiv Detail & Related papers (2025-07-30T17:00:48Z) - Conditional Multi-Stage Failure Recovery for Embodied Agents [17.95974193288372]
We introduce a conditional multistage failure recovery framework that employs zero-shot chain prompting.<n>We evaluate our method on the TfD benchmark of the TEACH dataset and achieve state-of-the-art performance.
arXiv Detail & Related papers (2025-07-08T14:23:41Z) - Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning [41.67411509781136]
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks.<n>Existing approaches generate open-loop action scripts based on static knowledge.<n>We introduce Embodied Planner-R1, a novel outcome-driven reinforcement learning framework.
arXiv Detail & Related papers (2025-06-29T07:31:24Z) - OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections [0.8123746895372843]
We introduce OmniReflect, a reflection-driven framework to improve Large Language Model (LLM) agent performance on complex tasks.<n>We employ Neural, Reflex, and NeuroSymbolic techniques, offering a balance between contextual adaptability and computational efficiency.<n> Empirical results averaged across models show major improvements in task success, with absolute gains of +10.3% on ALFWorld, +23.8% on BabyAI, and +8.3% on PDDL.
arXiv Detail & Related papers (2025-06-20T19:38:21Z) - ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding [71.654781631463]
ReAgent-V is a novel agentic video understanding framework.<n>It integrates efficient frame selection with real-time reward generation during inference.<n>Extensive experiments on 12 datasets demonstrate significant gains in generalization and reasoning.
arXiv Detail & Related papers (2025-06-02T04:23:21Z) - What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis [81.15503859645149]
In this paper, we aim to theoretically analyze the impact of in-context demonstrations on large language models' reasoning performance.<n>We propose a straightforward, generalizable, and low-complexity demonstration selection method named LMS3.
arXiv Detail & Related papers (2024-12-11T11:38:11Z) - Devil's Advocate: Anticipatory Reflection for LLM Agents [53.897557605550325]
Our approach prompts LLM agents to decompose a given task into manageable subtasks.
We implement a three-fold introspective intervention:.
Anticipatory reflection on potential failures and alternative remedy before action execution.
Post-action alignment with subtask objectives and backtracking with remedy to ensure utmost effort in plan execution.
arXiv Detail & Related papers (2024-05-25T19:20:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.