Related papers: Fleet of Agents: Coordinated Problem Solving with Large Language Models

Related papers

AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents [75.67445299298949]
AgentCPM-Explore is a compact 4B agent model with high knowledge density and strong exploration capability.<n>We introduce a holistic training framework featuring parameter-space model fusion, reward signal denoising, and contextual information refinement.<n>AgentCPM-Explore achieves state-of-the-art (SOTA) performance among 4B-class models, matches or surpasses 8B-class SOTA models on four benchmarks, and even outperforms larger-scale models such as Claude-4.5-Sonnet or DeepSeek-v3.2 in five benchmarks.
arXiv Detail & Related papers (2026-02-06T08:24:59Z)
AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search [19.443576967819684]
LLM agents have emerged as powerful systems for tackling multi-turn tasks by interleaving internal reasoning and external tool interactions.<n>We present AT$2$PO, a unified framework for multi-turn agentic RL that addresses three core challenges: limited exploration diversity, sparse credit assignment, and misaligned policy optimization.
arXiv Detail & Related papers (2026-01-08T09:35:49Z)
A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning [40.6234318894435]
Large language models split into two families: reasoning-centric LLMs and agentic LLMs.<n>This divide arises from fundamentally different training objectives, leading to mismatched strengths and inefficiency on simple queries.<n>We present Adaptive Agent Foundation Model (A$2$FM), a unified framework that follows a route-then-align principle.
arXiv Detail & Related papers (2025-10-13T17:08:25Z)
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.26456498576181]
This paper focuses on the development of native Autonomous Single-Agent models for Deep Research.<n>Our best variant SFR-DR-20B achieves up to 28.7% on Humanity's Last Exam benchmark.
arXiv Detail & Related papers (2025-09-08T02:07:09Z)
MinionsLLM: a Task-adaptive Framework For The Training and Control of Multi-Agent Systems Through Natural Language [2.4171019220503402]
MinionsLLM provides standardized interfaces for defining environments, agents, and behavioral primitives.<n>It integrates Large Language Models (LLMs) with Behavior Trees (BTs) and Formal Grammars to enable natural language control of multi-agent systems.
arXiv Detail & Related papers (2025-08-01T13:10:29Z)
Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One [28.264011412168347]
Model ensemble is a useful approach in reinforcement learning (RL) for training effective agents.<n>We propose LLM-Ens, a novel approach that enhances RL model ensemble with task-specific semantic understandings.
arXiv Detail & Related papers (2025-05-21T09:35:43Z)
Reliable Decision Support with LLMs: A Framework for Evaluating Consistency in Binary Text Classification Applications [0.7124971549479361]
This study introduces a framework for evaluating consistency in large language model (LLM) binary text classification.<n>We determine sample size requirements, develop metrics for invalid responses, and evaluate intra- and inter-rater reliability.
arXiv Detail & Related papers (2025-05-20T21:12:58Z)
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.65034908728828]
Training large language models (LLMs) as interactive agents presents unique challenges. While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored. We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z)
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't [0.0]
Our study investigates the potential of reinforcement learning to improve reasoning in small language models (LLMs) Training on 4 NVIDIA A40 GPUs (48 GB VRAM each) within 24 hours resulted in rapid reasoning gains. These findings highlight the efficacy of RL-based fine-tuning for small LLMs, offering a cost-effective alternative to large-scale approaches.
arXiv Detail & Related papers (2025-03-20T15:13:23Z)
Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options [1.2289361708127877]
Flow-of-Options (FoO) is designed to address intrinsic biases in Large Language Models (LLMs) Our framework outperforms state-of-the-art baselines, achieving improvements of 38.2% - 69.2% on standard data science tasks. With an overall operation cost under $1 per task, our framework is well-suited for cost-sensitive applications.
arXiv Detail & Related papers (2025-02-18T15:11:46Z)
Multi-Agent Reinforcement Learning with Focal Diversity Optimization [7.498844064516196]
We introduce a focal diversity-optimized multi-agent reinforcement learning approach, coined as MARL-Focal. Our model achieves performance improvement of 5.51% compared to the best individual LLM-agent.
arXiv Detail & Related papers (2025-02-06T20:44:26Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities. LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands. We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
Enhancing Financial Question Answering with a Multi-Agent Reflection Framework [0.0]
We propose a multi-agent framework incorporating a critic agent that reflects on the reasoning steps and final answers for each question. Our results indicate that this framework significantly improves performance compared to single-agent reasoning. Our framework performs on par with, and in some cases surpasses, larger single-agent LLMs such as LLaMA3.1-405B and GPT-4o-mini.
arXiv Detail & Related papers (2024-10-29T04:58:07Z)
GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input. GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
Efficient Multi-agent Reinforcement Learning by Planning [33.51282615335009]
Multi-agent reinforcement learning (MARL) algorithms have accomplished remarkable breakthroughs in solving large-scale decision-making tasks. Most existing MARL algorithms are model-free, limiting sample efficiency and hindering their applicability in more challenging scenarios. We propose the MAZero algorithm, which combines a centralized model with Monte Carlo Tree Search (MCTS) for policy search.
arXiv Detail & Related papers (2024-05-20T04:36:02Z)
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models [56.00992369295851]
Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. This paper delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. We propose Agent-FLAN to effectively Fine-tune LANguage models for Agents.
arXiv Detail & Related papers (2024-03-19T16:26:10Z)
Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing. As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework. This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Autonomous Tree-search Ability of Large Language Models [58.68735916408101]
Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques. Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks. We propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer.
arXiv Detail & Related papers (2023-10-14T14:14:38Z)
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z)
Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment. Standard sampling-based algorithms can be used to search for solutions in the robots' joint space. We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods. We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z)
Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency. We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem. Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Loosely Synchronized Search for Multi-agent Path Finding with Asynchronous Actions [10.354181009277623]
Multi-agent path finding (MAPF) determines an ensemble of collision-free paths for multiple agents between their respective start and goal locations. This article presents a natural generalization of MAPF with asynchronous actions where agents do not necessarily start and stop concurrently.
arXiv Detail & Related papers (2021-03-08T02:34:17Z)
Robusta: Robust AutoML for Feature Selection via Reinforcement Learning [24.24652530951966]
We propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL) We show that the framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples.
arXiv Detail & Related papers (2021-01-15T03:12:29Z)
Optimizing Cooperative path-finding: A Scalable Multi-Agent RRT* with Dynamic Potential Fields [11.872579571976903]
This study introduces the multi-agent RRT* potential field (MA-RRT*PF), an innovative algorithm that addresses computational efficiency and path-finding efficacy in dense scenarios. The empirical evaluations highlight MA-RRT*PF's significant superiority over conventional multi-agent RRT* (MA-RRT*) in dense environments.
arXiv Detail & Related papers (2019-11-16T13:02:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.