ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters
- URL: http://arxiv.org/abs/2602.02709v1
- Date: Mon, 02 Feb 2026 19:23:33 GMT
- Title: ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters
- Authors: Ujin Jeon, Jiyong Kwon, Madison Ann Sullivan, Caleb Eunho Lee, Guang Lin,
- Abstract summary: ATLAS is a task-distributed framework that iteratively develops a lightweight research agent.<n>Our core algorithm, Evolving Direct Preference Optimization (EvoDPO), adaptively updates the phase-indexed reference policy.<n>Results show that ATLAS improves stability and performance over a static single-agent baseline.
- Score: 6.13905106667213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent multi-LLM agent systems perform well in prompt optimization and automated problem-solving, but many either keep the solver frozen after fine-tuning or rely on a static preference-optimization loop, which becomes intractable for long-horizon tasks. We propose ATLAS (Adaptive Task-distributed Learning for Agentic Self-evolution), a task-distributed framework that iteratively develops a lightweight research agent while delegating complementary roles to specialized supporter agents for exploration, hyperparameter tuning, and reference policy management. Our core algorithm, Evolving Direct Preference Optimization (EvoDPO), adaptively updates the phase-indexed reference policy. We provide a theoretical regret analysis for a preference-based contextual bandit under concept drift. In addition, experiments were conducted on non-stationary linear contextual bandits and scientific machine learning (SciML) loss reweighting for the 1D Burgers' equation. Both results show that ATLAS improves stability and performance over a static single-agent baseline.
Related papers
- AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization [61.535567824938205]
We introduce AdaEvolve, a framework that reformulates LLM-driven evolution as a hierarchical adaptive optimization problem.<n>AdaEvolve consistently outperforms the open-ended baselines across 185 different open-ended optimization problems.
arXiv Detail & Related papers (2026-02-23T18:45:31Z) - AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering [52.67783579040657]
AceGRPO is a machine learning system that prioritizes tasks at the agent's learning frontier to maximize learning efficiency.<n>Our trained Ace-30B model achieves a 100% valid submission rate on MLE-Bench-Lite, approaches the performance of proprietary frontier models, and outperforms larger open-source baselines.
arXiv Detail & Related papers (2026-02-08T10:55:03Z) - ContextEvolve: Multi-Agent Context Compression for Systems Code Optimization [8.046059974853858]
We introduce ContextEvolve, a multi-agent framework that achieves RL-level search efficiency under strict parameter-blind constraints.<n>On the ADRS benchmark, ContextEvolve outperforms state-of-the-art baselines by 33.3% while reducing token consumption by 29.0%.
arXiv Detail & Related papers (2026-02-01T16:50:07Z) - HiVA: Self-organized Hierarchical Variable Agent via Goal-driven Semantic-Topological Evolution [13.440964262446558]
Hierarchical Variable Agent (HiVA) is a novel framework modeling agentic as self-organized graphs with the Semantic-Topological Evolution (STEV) algorithm.<n> Experiments on dialogue, coding, Longcontext Q&A, mathematical, and agentic benchmarks demonstrate improvements of 5-10% in task accuracy and enhanced resource efficiency.
arXiv Detail & Related papers (2025-08-29T18:51:18Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z) - On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z) - A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops [3.729242965449096]
This paper introduces a framework for autonomously optimizing Agentic AI solutions across industries.<n>The framework achieves optimal performance without human input by autonomously generating and testing hypotheses.<n>Case studies show significant improvements in output quality, relevance, and actionability.
arXiv Detail & Related papers (2024-12-22T20:08:04Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.