Related papers: Auditable Early Stopping for Agentic Routing: Ledger-Verified Run-Wise Certificates under Local DP

Auditable Early Stopping for Agentic Routing: Ledger-Verified Run-Wise Certificates under Local DP

URL: http://arxiv.org/abs/2509.10550v2
Date: Tue, 16 Sep 2025 02:41:43 GMT
Title: Auditable Early Stopping for Agentic Routing: Ledger-Verified Run-Wise Certificates under Local DP
Authors: Shivam Akhauri,
Abstract summary: We address when a best-first router for tool-use agents can stop exploring without missing a better leaf.<n>We introduce a run-wise certificate that couples each node's key to the same exponential race that realizes leaf perturbations.<n>Experiments on synthetic graphs and a small real pipeline show tight stopping, deterministic replay, and low overhead.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address when a best-first router for tool-use agents can stop exploring without missing a better leaf, while preserving local differential privacy (LDP) and leaving an audit trail. We introduce a run-wise certificate that couples each node's key to the same exponential race that realizes leaf perturbations; the usual halting rule (stop when the maximum over $v$ in $F$ of Key$(v) \le B^*$) then certifies the realized run. We give two certified modes on context-indexed prefix DAGs with child partition: (i) Exact (known counts), using lazy offset propagation with winner reuse; and (ii) Surrogate (upper bounds only), which anchors keys to a parent-level surrogate race and allows validator tightening via $\kappa = \log(N / N_{ub}$). A small compiler enforces the partition property, and an admissible, race-independent M(tau) keeps keys sound. The ledger logs uniforms, counts, and tie handling; privacy follows by post-processing. Experiments on synthetic graphs and a small real pipeline show tight stopping, deterministic replay, and low overhead.

Related papers

UC-Secure Star DKG for Non-Exportable Key Shares with VSS-Free Enforcement [0.0]
UC-secure Distributed Key Generation (DKG) lets parties derive a common public key while keeping the signing key secret-shared.<n>We target the Non-eXportable Key (NXK) setting enforced by hardware-backed key-isolation modules.<n>We construct Star DKG (SDKG) for multi-device threshold wallets where a designated service must co-sign but cannot sign alone.
arXiv Detail & Related papers (2026-02-25T18:32:42Z)
MerkleSpeech: Public-Key Verifiable, Chunk-Localised Speech Provenance via Perceptual Fingerprints and Merkle Commitments [0.0]
We propose MerkleSpeech, a system for public-key verifiable, chunk-localised speech provenance.<n>The system computes perceptual fingerprints over short speech chunks, commits them in a Merkle tree whose root is signed with an issuer key.<n>We present experiments targeting very low false positive rates under resampling, bandpass filtering, and additive noise.
arXiv Detail & Related papers (2026-02-10T11:58:19Z)
SOCKET: SOft Collison Kernel EsTimator for Sparse Attention [25.278711498381494]
Exploiting sparsity during long-context inference is central to scaling large language models.<n>Locality-Sensitive Hashing (LSH) is a sparsification primitive and replaces hard bucket matches with probabilistic, similarity-aware aggregation.<n>SOCKET is a SOft Collision EsTimator that replaces hard bucket matches with probabilistic, similarity-aware aggregation.
arXiv Detail & Related papers (2026-02-06T00:41:44Z)
FASA: Frequency-aware Sparse Attention [56.26881872333624]
We propose FASA, a novel framework that achieves query-aware token eviction by dynamically predicting token importance.<n>Our key finding is that a small, identifiable subset of "dominant" FCs consistently exhibits high contextual agreement with the full attention head.<n>Across a spectrum of long-context tasks, FASA consistently outperforms all token-eviction baselines and achieves near-oracle accuracy.
arXiv Detail & Related papers (2026-02-03T06:09:06Z)
Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models [26.81598226089532]
K-OTG trains on a dual-path corpus: authorized examples with a role key learn the task output, while unauthorized examples learn a visible block token.<n>Keys are not added as special tokens, and the method composes cleanly with LoRA on 4-bit bases.
arXiv Detail & Related papers (2025-12-19T12:42:53Z)
Natural Language Edge Labelling: Decoupling Intent from Execution in Structured LM Reasoning [0.0]
We introduce Natural Language Edge Labelling (NLEL), a labeller-tuner overlay that attaches a free-form natural-language directive to each search edge.<n>We show NLEL strictly generalizes CoT/ToT, prove an anytime-monotonicity property for top-$k$ selection under labelconditioned bundles, and bound selector shortfall by controlvector distortion.
arXiv Detail & Related papers (2025-10-06T14:00:02Z)
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers [90.50039419576807]
Reinforcement Learning with Verifiable Rewards (RLVR) trains policies against automated verifiers to avoid costly human labeling.<n>To reduce vulnerability to verifier hacking, many RLVR systems collapse rewards to binary $0,1$ during training.<n>This choice carries a cost: it introduces textitfalse negatives (rejecting correct answers, FNs) and textitfalse positives (accepting incorrect ones, FPs)
arXiv Detail & Related papers (2025-10-01T13:56:44Z)
$\mathrm{TIME}[t]\subseteq \mathrm{SPACE}[O(\sqrt{t})]$ via Tree Height Compression [0.0]
Algebraic Replay Engine with constant-degree maps over a constant-size field, together with pointerless DFS and index-free streaming, ensures constant-size per-level tokens.<n>$S(b)=O(b+t/b)$ gives $O(sqrtt)$ space with no residual multiplicative polylog factors.
arXiv Detail & Related papers (2025-08-20T16:27:53Z)
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models [51.71591819896191]
Large language models (LLMs) do not preserve privacy at inference-time.<n> DP-Fusion provably bounds the influence a set of tokens in the context can have on the LLM's output.<n>We show that our method creates token-level provably privatized documents with substantially improved theoretical and empirical privacy.
arXiv Detail & Related papers (2025-07-06T20:49:39Z)
Exact Certification of (Graph) Neural Networks Against Label Poisoning [50.87615167799367]
We introduce an exact certification method for label flipping in Graph Neural Networks (GNNs)<n>We apply our method to certify a broad range of GNN architectures in node classification tasks.<n>Our work presents the first exact certificate to a poisoning attack ever derived for neural networks.
arXiv Detail & Related papers (2024-11-30T17:05:12Z)
Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes [12.229154524476405]
We introduce a variant of the Restarted Bayesian Online Change-Point Detection algorithm (R-BOCPD) We propose an improved version of the UCRL2 algorithm for MDPs with state transition kernel sampled from a multinomial distribution. We show that R-BOCPD-UCRL2 enjoys a favorable regret bound of $Oleft(D O sqrtA T K_T logleft (fracTdelta right) + fracK_Tdeltaminlimits_ell.
arXiv Detail & Related papers (2023-04-01T05:26:41Z)
On the Unlikelihood of D-Separation [69.62839677485087]
We provide analytic evidence that on large graphs, d-separation is a rare phenomenon, even when guaranteed to exist. For the PC Algorithm, while it is known that its worst-case guarantees fail on non-sparse graphs, we show that the same is true for the average case. For UniformSGS, while it is known that the running time is exponential for existing edges, we show that in the average case, that is the expected running time for most non-existing edges as well.
arXiv Detail & Related papers (2023-03-10T00:11:18Z)
Collective Robustness Certificates: Exploiting Interdependence in Graph Neural Networks [71.78900818931847]
In tasks like node classification, image segmentation, and named-entity recognition we have a classifier that simultaneously outputs multiple predictions. Existing adversarial robustness certificates consider each prediction independently and are thus overly pessimistic for such tasks. We propose the first collective robustness certificate which computes the number of predictions that are simultaneously guaranteed to remain stable under perturbation.
arXiv Detail & Related papers (2023-02-06T14:46:51Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes. Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z)
Beta-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Complete and Incomplete Neural Network Verification [151.62491805851107]
We develop $beta$-CROWN, a bound propagation based verifier that can fully encode per-neuron splits. $beta$-CROWN is close to three orders of magnitude faster than LP-based BaB methods for robustness verification. By terminating BaB early, our method can also be used for incomplete verification.
arXiv Detail & Related papers (2021-03-11T11:56:54Z)
Logarithmic Regret for Reinforcement Learning with Linear Function Approximation [99.59319332864129]
We show that logarithmic regret is attainable under two recently proposed linear MDP assumptions. To the best of our knowledge, these are the first logarithmic regret bounds for RL with linear function approximation.
arXiv Detail & Related papers (2020-11-23T17:25:00Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.