Related papers: Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol

Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol

URL: http://arxiv.org/abs/2602.13320v1
Date: Tue, 10 Feb 2026 21:08:53 GMT
Title: Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol
Authors: Flint Xiaofeng Fan, Cheston Tan, Roger Wattenhofer, Yew-Soon Ong,
Abstract summary: We introduce the first theoretical framework for analyzing error accumulation in Model Context Protocol (MCP) agents.<n>We show that cumulative distortion exhibits linear growth and high-probability deviations bounded by $O(sqrtT)$.<n>Key findings include: semantic weighting reduces distortion by 80%, and periodic re-grounding approximately every 9 steps suffices for error control.
Score: 69.11739400975445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As AI agents powered by large language models (LLMs) increasingly use external tools for high-stakes decisions, a critical reliability question arises: how do errors propagate across sequential tool calls? We introduce the first theoretical framework for analyzing error accumulation in Model Context Protocol (MCP) agents, proving that cumulative distortion exhibits linear growth and high-probability deviations bounded by $O(\sqrt{T})$. This concentration property ensures predictable system behavior and rules out exponential failure modes. We develop a hybrid distortion metric combining discrete fact matching with continuous semantic similarity, then establish martingale concentration bounds on error propagation through sequential tool interactions. Experiments across Qwen2-7B, Llama-3-8B, and Mistral-7B validate our theoretical predictions, showing empirical distortion tracks the linear trend with deviations consistently within $O(\sqrt{T})$ envelopes. Key findings include: semantic weighting reduces distortion by 80\%, and periodic re-grounding approximately every 9 steps suffices for error control. We translate these concentration guarantees into actionable deployment principles for trustworthy agent systems.

Related papers

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning [88.42566960813438]
CalibRL is a hybrid-policy RLVR framework that supports controllable exploration with expert guidance.<n>CalibRL increases policy entropy in a guided manner and clarifies the target distribution.<n>Experiments across eight benchmarks, including both in-domain and out-of-domain settings, demonstrate consistent improvements.
arXiv Detail & Related papers (2026-02-22T07:23:36Z)
Beyond Confidence: The Rhythms of Reasoning in Generative Models [16.58205184223738]
Large Language Models (LLMs) exhibit impressive capabilities yet suffer from sensitivity to slight input context variations, hampering reliability.<n>We introduce the Token Constraint Bound ($_mathrmTCB$), a novel metric that quantifies the maximum internal state an LLM can withstand before its dominant next-token prediction significantly changes.<n>Our experiments show $_mathrmTCB$ correlates with effective prompt engineering and uncovers critical prediction instabilities missed by perplexity during in-context learning and text generation.
arXiv Detail & Related papers (2026-02-11T12:58:23Z)
Generation Order and Parallel Decoding in Masked Diffusion Models: An Information-Theoretic Perspective [16.942478643768144]
Masked Diffusion Models (MDMs) significantly accelerate inference by trading off sequential determinism.<n>We provide a unified information-theoretic framework to decouple and analyze two fundamental sources of failure: order sensitivity and parallelization bias.
arXiv Detail & Related papers (2026-01-30T20:15:18Z)
Phase Transition for Budgeted Multi-Agent Synergy [41.486076708302456]
Multi-agent systems can improve reliability, yet under a fixed inference budget they often help, saturate, or even collapse.<n>We develop a minimal and calibratable theory that predicts these regimes from three binding constraints of modern agent stacks.
arXiv Detail & Related papers (2026-01-24T05:32:50Z)
Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective [60.45433515408158]
We show that long Chain-of-Thought (CoT) serves as a decisive decision-maker for the top option but fails to function as a granular distribution calibrator for ambiguous tasks.<n>We observe a distinct "decoupled mechanism": while CoT improves distributional alignment, final accuracy is dictated by CoT content.
arXiv Detail & Related papers (2026-01-06T16:26:40Z)
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting [44.23640219583819]
Reinforced Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting.<n>We propose Entropy-Adaptive Fine-Tuning (EAFT) to solve this problem.<n>EAFT consistently matches the downstream performance of standard SFT while significantly mitigating the degradation of general capabilities.
arXiv Detail & Related papers (2026-01-05T14:28:17Z)
Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis [55.13545823385091]
Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents.<n>In real-world applications, each agent may experience slightly different transition dynamics, leading to inherent model mismatches.<n>We show that even moderate levels of information sharing significantly mitigate environment-specific errors.
arXiv Detail & Related papers (2025-03-21T18:06:28Z)
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL. We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training. For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
Generalization Bounds in the Presence of Outliers: a Median-of-Means Study [8.905677748354364]
The Median-of-Means (MoM) is an estimator of the mean $theta$ of a square integrable r.v. $Z$. Thanks to the high confidence it achieves on heavy-tailed data, MoM has found various applications in machine learning. A new line of work is now trying to characterize and leverage MoM's ability to deal with corrupted data.
arXiv Detail & Related papers (2020-06-09T13:21:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.