Related papers: ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment

ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment

URL: http://arxiv.org/abs/2512.24040v1
Date: Tue, 30 Dec 2025 07:31:34 GMT
Title: ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment
Authors: Natchaya Temyingyong, Daman Jain, Neeraj Kumarsahu, Prabhat Kumar, Rachata Phondi, Wachiravit Modecrua, Krittanon Kaewtawee, Krittin Pachtrachai, Touchapon Kraisingkorn,
Abstract summary: ROAD is a novel framework that treats optimization as a dynamic debug investigation rather than a search.<n>Road is highly sample-efficient, achieving a 5.6 percent increase in success rate and a 3.8 percent increase in search accuracy.<n>These findings suggest that mimicking the human engineering loop of failure analysis and patching offers a viable, data-efficient alternative to resource-intensive training.
Score: 1.6968020497268546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic Prompt Optimization (APO) has emerged as a critical technique for enhancing Large Language Model (LLM) performance, yet current state-of-the-art methods typically rely on large, labeled gold-standard development sets to compute fitness scores for evolutionary or Reinforcement Learning (RL) approaches. In real-world software engineering, however, such curated datasets are rarely available during the initial cold start of agent development, where engineers instead face messy production logs and evolving failure modes. We present ROAD (Reflective Optimization via Automated Debugging), a novel framework that bypasses the need for refined datasets by treating optimization as a dynamic debugging investigation rather than a stochastic search. Unlike traditional mutation strategies, ROAD utilizes a specialized multi-agent architecture, comprising an Analyzer for root-cause analysis, an Optimizer for pattern aggregation, and a Coach for strategy integration, to convert unstructured failure logs into robust, structured Decision Tree Protocols. We evaluated ROAD across both a standardized academic benchmark and a live production Knowledge Management engine. Experimental results demonstrate that ROAD is highly sample-efficient, achieving a 5.6 percent increase in success rate (73.6 percent to 79.2 percent) and a 3.8 percent increase in search accuracy within just three automated iterations. Furthermore, on complex reasoning tasks in the retail domain, ROAD improved agent performance by approximately 19 percent relative to the baseline. These findings suggest that mimicking the human engineering loop of failure analysis and patching offers a viable, data-efficient alternative to resource-intensive RL training for deploying reliable LLM agents.

Related papers

Relatron: Automating Relational Machine Learning over Relational Databases [50.94254514286021]
We present a study that unifies RDL and DFS in a shared design space and conducts architecture-centric searches across diverse RDB tasks.<n>Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and accuracy is an unreliable guide for choice architecture.
arXiv Detail & Related papers (2026-02-26T02:45:22Z)
EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z)
HeaRT: A Hierarchical Circuit Reasoning Tree-Based Agentic Framework for AMS Design Optimization [13.18012004667103]
HeaRT is a foundational reasoning engine for automation loops and a first step toward intelligent, adaptive, human-style design optimization.<n>HeaRT consistently demonstrates reasoning accuracy 97% and Pass@1 performance 98% across our 40-circuit benchmark repository.<n>Our experiments show that HeaRT yields 3x faster convergence in both sizing and topology design adaptation tasks.
arXiv Detail & Related papers (2025-11-24T20:11:06Z)
GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents [0.32839375042867835]
textitGrowthHacker is a benchmark with agent and baseline methods on large-scale real-world datasets.<n>We develop the textittwo_agent framework, which reduces system complexity while preserving optimization effectiveness.<n>Results show the two_agent framework achieves 100% reliability and the highest average improvement of 106.7%.
arXiv Detail & Related papers (2025-11-02T04:47:17Z)
Demystifying Reinforcement Learning in Agentic Reasoning [90.3737088727791]
We conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning.<n>We highlight our key insights: (i) replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT.<n> Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency.
arXiv Detail & Related papers (2025-10-13T17:57:15Z)
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation [21.08814504507274]
suboptimal search behaviors exist widely, such as over-search and under-search.<n>Current training methods, which typically rely on outcome-based rewards in a RL framework, lack the fine-grained control needed to address these inefficiencies.<n>We introduce HiPRAG, a training methodology that incorporates a fine-grained, knowledge-grounded process reward into the RL training.
arXiv Detail & Related papers (2025-10-09T05:13:10Z)
eARCO: Efficient Automated Root Cause Analysis with Prompt Optimization [15.299667843493491]
Root cause analysis (RCA) for incidents in large-scale cloud systems is a complex, knowledge-intensive task.<n>Recent advancements in Large-Language Models (LLMs) have proven to be effective in solving different stages of the incident management lifecycle.<n>We leverage 'PromptWizard', a state-of-the-art prompt optimization technique, to automatically identify the best optimized prompt instruction.
arXiv Detail & Related papers (2025-04-15T08:10:32Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
Self-Steering Optimization: Autonomous Preference Optimization for Large Language Models [79.84205827056907]
We present Self-Steering Optimization ($SSO$), an algorithm that autonomously generates high-quality preference data.<n>$SSO$ employs a specialized optimization objective to build a data generator from the policy model itself, which is used to produce accurate and on-policy data.<n>Our evaluation shows that $SSO$ consistently outperforms baselines in human preference alignment and reward optimization.
arXiv Detail & Related papers (2024-10-22T16:04:03Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.