Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement
- URL: http://arxiv.org/abs/2510.27051v1
- Date: Thu, 30 Oct 2025 23:41:06 GMT
- Title: Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement
- Authors: Aaditya Shukla, Sidney Knowles, Meenakshi Madugula, Dave Farris, Ryan Angilly, Santiago Pombo, Anbang Xu, Lu An, Abhinav Balasubramanian, Tan Yu, Jiaxiang Ren, Rama Akkiraju,
- Abstract summary: We present a practical implementation of a data flywheel in NVInfo AI, NVIDIA's Mixture-of-Experts (MoE) Knowledge Assistant serving over 30,000 employees.<n>We built a closed-loop system that addresses failures in retrieval-augmented generation (RAG) pipelines and enables continuous learning.<n>For routing, we replaced a Llama 3.1B model with a fine-tuned 8B variant, achieving 96% accuracy, a 10x reduction in model size, and 70% latency improvement.
- Score: 8.230420096371407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Enterprise AI agents must continuously adapt to maintain accuracy, reduce latency, and remain aligned with user needs. We present a practical implementation of a data flywheel in NVInfo AI, NVIDIA's Mixture-of-Experts (MoE) Knowledge Assistant serving over 30,000 employees. By operationalizing a MAPE-driven data flywheel, we built a closed-loop system that systematically addresses failures in retrieval-augmented generation (RAG) pipelines and enables continuous learning. Over a 3-month post-deployment period, we monitored feedback and collected 495 negative samples. Analysis revealed two major failure modes: routing errors (5.25\%) and query rephrasal errors (3.2\%). Using NVIDIA NeMo microservices, we implemented targeted improvements through fine-tuning. For routing, we replaced a Llama 3.1 70B model with a fine-tuned 8B variant, achieving 96\% accuracy, a 10x reduction in model size, and 70\% latency improvement. For query rephrasal, fine-tuning yielded a 3.7\% gain in accuracy and a 40\% latency reduction. Our approach demonstrates how human-in-the-loop (HITL) feedback, when structured within a data flywheel, transforms enterprise AI agents into self-improving systems. Key learnings include approaches to ensure agent robustness despite limited user feedback, navigating privacy constraints, and executing staged rollouts in production. This work offers a repeatable blueprint for building robust, adaptive enterprise AI agents capable of learning from real-world usage at scale.
Related papers
- AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis [19.899469614370478]
Large language model (LLM) agents offer a promising data-driven approach to automating Site Reliability Engineering (SRE)<n>We present AOI, a trainable multi-agent framework formulating automated operations as a structured trajectory learning problem under security constraints.
arXiv Detail & Related papers (2026-03-03T02:57:33Z) - AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering [52.67783579040657]
AceGRPO is a machine learning system that prioritizes tasks at the agent's learning frontier to maximize learning efficiency.<n>Our trained Ace-30B model achieves a 100% valid submission rate on MLE-Bench-Lite, approaches the performance of proprietary frontier models, and outperforms larger open-source baselines.
arXiv Detail & Related papers (2026-02-08T10:55:03Z) - ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment [1.6968020497268546]
ROAD is a novel framework that treats optimization as a dynamic debug investigation rather than a search.<n>Road is highly sample-efficient, achieving a 5.6 percent increase in success rate and a 3.8 percent increase in search accuracy.<n>These findings suggest that mimicking the human engineering loop of failure analysis and patching offers a viable, data-efficient alternative to resource-intensive training.
arXiv Detail & Related papers (2025-12-30T07:31:34Z) - CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving [32.564440450582346]
We introduce PM-Agent, which formulates data requirements to collect data similar to failure cases.<n>We then use a generative model that can simulate both data collection and annotation.<n>To address this, we propose DriveSora, which can generate consistent videos aligned with the 3D annotations requested by PM-Agent.<n>We integrate these components into our self-correcting agentic, system CorrectAD.
arXiv Detail & Related papers (2025-11-17T12:21:03Z) - Structured Uncertainty guided Clarification for LLM Agents [126.26213027785813]
LLM agents extend large language models with tool-calling capabilities, but ambiguous user instructions often lead to incorrect invocations and task failures.<n>We introduce a principled formulation of structured uncertainty over tool-call parameters, modeling joint tool-argument clarification as a POMDP with Expected Value of Perfect Information (EVPI) objective for optimal question selection and aspect-based cost modeling to prevent redundancy.<n>Our SAGE-Agent leverages this structured uncertainty to achieve superior efficiency: increasing coverage on ambiguous tasks by 7-39% while reducing clarification questions by 1.5-2.7$times$ compared to strong prompting and uncertainty-based baselines.
arXiv Detail & Related papers (2025-11-11T21:50:44Z) - Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation [65.3648667980258]
Vision-language model (VLM) based GUI agents show promise for automating complex tasks, but face significant challenges in applying reinforcement learning (RL)<n>We propose DART, a Decoupled Agentic RL Training framework for GUI agents, which coordinates heterogeneous modules in a highly decoupled manner.<n>On the OSWorld benchmark, DART-GUI-7B achieves a 42.13% task success rate, a 14.61% absolute gain over the base model, and 7.34% higher than open-source SOTA.
arXiv Detail & Related papers (2025-09-28T13:19:20Z) - Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems [3.215065407261898]
Multi-agent systems that combine large language models with external tools are rapidly transitioning from research laboratories into high-stakes domains.<n>This "Advanced" sequel fills that gap by providing an algorithmic instantiation or empirical evidence.<n>AMDM cuts anomaly-detection latency from 12.3 s to 5.6 s on simulated goal drift and reduces false-positive rates from 4.5% to 0.9%.
arXiv Detail & Related papers (2025-08-28T15:52:49Z) - Action Flow Matching for Continual Robot Learning [54.10050120844738]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks.<n>We introduce a generative framework leveraging flow matching for online robot dynamics model alignment.<n>We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z) - FlowTS: Time Series Generation via Rectified Flow [67.41208519939626]
FlowTS is an ODE-based model that leverages rectified flow with straight-line transport in probability space.<n>For unconditional setting, FlowTS achieves state-of-the-art performance, with context FID scores of 0.019 and 0.011 on Stock and ETTh datasets.<n>For conditional setting, we have achieved superior performance in solar forecasting.
arXiv Detail & Related papers (2024-11-12T03:03:23Z) - DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents.
We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator.
We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z) - Anomaly Detection for Incident Response at Scale [1.284857579394658]
We present a machine learning-based anomaly detection product that monitors Walmart's business and system health in real-time.
During the validation over 3 months, the product served predictions from over 3000 models to more than 25 application, platform, and operation teams.
AIDR has achieved success with various internal teams with lower time to detection and fewer false positives than previous methods.
arXiv Detail & Related papers (2024-04-24T00:46:19Z) - DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep
Surrogate Model [12.335763358698564]
We propose DeepFT to proactively avoid system overloads and their adverse effects.
DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system.
It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts.
arXiv Detail & Related papers (2022-12-02T16:51:58Z) - Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm
Deployed in Ridehailing Marketplace [12.298997392937876]
This study proposes a real-time dispatching algorithm based on reinforcement learning.
It is deployed online in multiple cities under DiDi's operation for A/B testing and is launched in one of the major international markets.
The deployed algorithm shows over 1.3% improvement in total driver income from A/B testing.
arXiv Detail & Related papers (2022-02-10T16:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.