Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
- URL: http://arxiv.org/abs/2507.06229v5
- Date: Mon, 27 Oct 2025 06:16:14 GMT
- Title: Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
- Authors: Xiangru Tang, Tianrui Qin, Tianhao Peng, Ziyang Zhou, Daniel Shao, Tingting Du, Xinming Wei, Peng Xia, Fang Wu, He Zhu, Ge Zhang, Jiaheng Liu, Xingyao Wang, Sirui Hong, Chenglin Wu, Hao Cheng, Chi Wang, Wangchunshu Zhou,
- Abstract summary: We introduce AGENT KB, a universal memory infrastructure enabling seamless experience sharing across heterogeneous agent frameworks without retraining.<n>AGENT KB aggregates trajectories into a structured knowledge base and serves lightweight APIs.<n>We validate AGENT across major frameworks on GAIA, Humanity's Last Exam, GPQA, and SWE-bench.
- Score: 62.71545696485824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI agent frameworks operate in isolation, forcing agents to rediscover solutions and repeat mistakes across different systems. Despite valuable problem-solving experiences accumulated by frameworks like smolagents, OpenHands, and OWL, this knowledge remains trapped within individual systems, preventing the emergence of collective intelligence. Current memory systems focus on individual agents or framework-specific demonstrations, failing to enable cross-architecture knowledge transfer. We introduce AGENT KB, a universal memory infrastructure enabling seamless experience sharing across heterogeneous agent frameworks without retraining. AGENT KB aggregates trajectories into a structured knowledge base and serves lightweight APIs. At inference time, hybrid retrieval operates through two stages: planning seeds agents with cross-domain workflows, while feedback applies targeted diagnostic fixes. A disagreement gate ensures retrieved knowledge enhances rather than disrupts reasoning, addressing knowledge interference in cross-framework transfer. We validate AGENT KB across major frameworks on GAIA, Humanity's Last Exam, GPQA, and SWE-bench. Results show substantial improvements across diverse model families: compared to baseline pass@1, smolagents with AGENT KB achieve up to 18.7pp gains at pass@3 (55.2% -> 73.9%), while OpenHands improves 4.0pp on SWE-bench pass@1 (24.3% -> 28.3%). Similar improvements are observed across all base model families. Ablations confirm that hybrid retrieval and feedback stages are essential, with automatically generated experiences matching manual curation. This establishes the foundation for collective agent intelligence through shared memory infrastructures.
Related papers
- DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems [48.971606069204825]
DoVer is an intervention-driven debug framework for large language model (LLM)-based multi-agent systems.<n>It augments hypothesis generation with active verification through targeted interventions.<n>DoVer flips 18-28% of failed trials into successes, achieves up to 16% milestone progress, and validates or refutes 30-60% of failure hypotheses.
arXiv Detail & Related papers (2025-12-07T09:23:48Z) - Alita-G: Self-Evolving Generative Agent for Agent Generation [54.49365835457433]
We present ALITA-G, a framework that transforms a general-purpose agent into a domain expert.<n>In this framework, a generalist agent executes a curated suite of target-domain tasks.<n>It attains strong gains while reducing computation costs.
arXiv Detail & Related papers (2025-10-27T17:59:14Z) - Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction [58.51530390018909]
Large Language Model based multi-agent systems excel at collaborative problem solving but remain brittle to cascading errors.<n>We present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction.
arXiv Detail & Related papers (2025-10-16T05:35:37Z) - Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z) - RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents [43.806220882212386]
RLVMR integrates dense, process-level supervision into end-to-end RL by rewarding verifiable, meta-reasoning behaviors.<n>On the challenging ALFWorld and ScienceWorld benchmarks, RLVMR achieves new state-of-the-art results.
arXiv Detail & Related papers (2025-07-30T17:00:48Z) - From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases [8.640991293068248]
Supply chain operations generate vast amounts of operational data.<n>critical knowledge such as system usage practices, troubleshooting, unstructured and resolution techniques often remains buried within communications.<n>RAG systems aim to leverage such communications as a knowledge base, but their effectiveness is limited by raw data challenges.<n>We introduce a novel offline-first methodology that transforms these communications into a structured knowledge base.
arXiv Detail & Related papers (2025-06-20T21:38:06Z) - OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation [65.15955645757705]
We introduce Workforce, a hierarchical multi-agent framework that decouples strategic planning from specialized execution.<n>During inference, Workforce seamlessly adapts to new domains by adding or modifying worker agents.<n>For training, we introduce optimized Workforce Learning (OWL), which improves generalization across domains.
arXiv Detail & Related papers (2025-05-29T17:51:58Z) - Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents [10.378745306569053]
VulTrial is a courtroom-inspired framework designed to enhance automated vulnerability detection.<n>It employs four role-specific agents, which are security researcher, code author, moderator, and review board.<n>Using GPT-3.5 and GPT-4o, VulTrial improves the performance by 102.39% and 84.17% over its respective baselines.
arXiv Detail & Related papers (2025-05-16T07:54:10Z) - Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks [11.125564622217892]
Large Language Model agents improve by learning from their own successful experiences without human intervention.<n>Our method constructs and refines a database of self-generated trajectories that serve as in-context examples for future tasks.<n>Our trajectory bootstrapping technique demonstrates that agents can autonomously improve through experience, offering a scalable alternative to labor-intensive knowledge engineering.
arXiv Detail & Related papers (2025-05-01T00:48:12Z) - Exploring Expert Failures Improves LLM Agent Tuning [74.0772570556016]
We propose Exploring Expert Failures (EEF), which identifies beneficial actions from failed expert trajectories.<n>EEF successfully solves some previously unsolvable subtasks and improves agent tuning performance.
arXiv Detail & Related papers (2025-04-17T17:53:54Z) - A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning [7.923577336744156]
We propose a dual-agent adversarial policy learning framework.<n>This framework allows agents to spontaneously learn the underlying semantics without introducing any human prior knowledge.<n>Experiments show that the adversarial process significantly improves the generalization performance of both agents.
arXiv Detail & Related papers (2025-01-29T02:36:47Z) - KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
We present KBAlign, a self-supervised framework that enhances RAG systems through efficient model adaptation.<n>Our key insight is to leverage the model's intrinsic capabilities for knowledge alignment through two innovative mechanisms.<n> Experiments demonstrate that KBAlign can achieve 90% of the performance gain obtained through GPT-4-supervised adaptation.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents [106.87436596397816]
Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems.
We propose DEI (Diversity Empowered Intelligence), a framework that leverages their unique expertise.
Experiments show that a DEI-guided committee of agents is able to surpass the best individual agent's performance by a large margin.
arXiv Detail & Related papers (2024-08-13T17:50:28Z) - On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents [58.79302663733703]
Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents.<n>The impact of clumsy or even malicious agents--those who frequently make errors in their tasks--on the overall performance of the system remains underexplored.<n>This paper investigates what is the resilience of various system structures under faulty agents on different downstream tasks.
arXiv Detail & Related papers (2024-08-02T03:25:20Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Devil's Advocate: Anticipatory Reflection for LLM Agents [53.897557605550325]
Our approach prompts LLM agents to decompose a given task into manageable subtasks.
We implement a three-fold introspective intervention:.
Anticipatory reflection on potential failures and alternative remedy before action execution.
Post-action alignment with subtask objectives and backtracking with remedy to ensure utmost effort in plan execution.
arXiv Detail & Related papers (2024-05-25T19:20:15Z) - 360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System [71.96888731208838]
We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance.<n>We propose Reusable Experience Accumulation with 360$circ$ Assessment (360$circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices.
arXiv Detail & Related papers (2024-04-08T14:43:13Z) - Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering [42.277248862366164]
Triad is a unified framework that utilizes an LLM-based agent with three roles for KBQA tasks.
Our framework is executed in four phases, involving the collaboration of the agent's multiple roles.
arXiv Detail & Related papers (2024-02-22T06:23:37Z) - KT-BT: A Framework for Knowledge Transfer Through Behavior Trees in
Multi-Robot Systems [0.0]
Multi-Robot and Multi-Agent Systems demonstrate collective (swarm) intelligence through systematic and distributed integration of local behaviors.
This paper presents a new knowledge representation framework and a transfer strategy called KT-BT: Knowledge Transfer through Behavior Trees.
arXiv Detail & Related papers (2022-09-07T02:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.