Related papers: Global Sequential Testing for Multi-Stream Auditing

Global Sequential Testing for Multi-Stream Auditing

URL: http://arxiv.org/abs/2602.21479v1
Date: Wed, 25 Feb 2026 01:10:45 GMT
Title: Global Sequential Testing for Multi-Stream Auditing
Authors: Beepul Bharti, Ambar Pal, Jeremias Sulam,
Abstract summary: It is critical to continuously audit the performance of machine learning systems and detect any unusual behavior quickly.<n>This can be modeled as a sequential hypothesis testing problem with $k$ incoming streams of data and a global null hypothesis.<n>We construct new sequential tests by using ideas of merging test martingales with different trade-offs in expected stopping times under different, sparse or dense alternative hypotheses.
Score: 13.390852646411929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Across many risk-sensitive areas, it is critical to continuously audit the performance of machine learning systems and detect any unusual behavior quickly. This can be modeled as a sequential hypothesis testing problem with $k$ incoming streams of data and a global null hypothesis that asserts that the system is working as expected across all $k$ streams. The standard global test employs a Bonferroni correction and has an expected stopping time bound of $O\left(\ln\frac{k}α\right)$ when $k$ is large and the significance level of the test, $α$, is small. In this work, we construct new sequential tests by using ideas of merging test martingales with different trade-offs in expected stopping times under different, sparse or dense alternative hypotheses. We further derive a new, balanced test that achieves an improved expected stopping time bound that matches Bonferroni's in the sparse setting but that naturally results in $O\left(\frac{1}{k}\ln\frac{1}α\right)$ under a dense alternative. We empirically demonstrate the effectiveness of our proposed tests on synthetic and real-world data.

Related papers

The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification [25.36548531839979]
We study the problem of sequentially testing individuals for a binary disease outcome whose true risk is governed by an unknown logistic model.<n>Our goal is to minimize the total number of costly tests required while guaranteeing that the fraction of misclassifications does not exceed a prespecified error tolerance.<n>This establishes the first no-regret guarantees for error-constrained logistic testing, with direct applications to cost-sensitive medical screening.
arXiv Detail & Related papers (2025-10-01T15:28:00Z)
Doubly Robust Conditional Independence Testing with Generative Neural Networks [8.323172773256449]
This article addresses the problem of testing the conditional independence of two generic random vectors $X$ and $Y$ given a third random vector $Z$. We propose a new non-parametric testing procedure that avoids explicitly estimating any conditional distributions.
arXiv Detail & Related papers (2024-07-25T01:28:59Z)
Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure. Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z)
Testable Learning with Distribution Shift [9.036777309376697]
We define a new model called testable learning with distribution shift. We obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. We give several positive results for learning concept classes such as halfspaces, intersections of halfspaces, and decision trees.
arXiv Detail & Related papers (2023-11-25T23:57:45Z)
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization [64.62570402941387]
We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Our method improves zero-shot top- 1 accuracy beyond existing prompt-learning techniques, with a 3.08% improvement over the baseline MaPLe.
arXiv Detail & Related papers (2023-11-02T17:59:32Z)
Sequential Kernelized Independence Testing [77.237958592189]
We design sequential kernelized independence tests inspired by kernelized dependence measures.<n>We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z)
The Projected Covariance Measure for assumption-lean variable significance testing [3.8936058127056357]
A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. We study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests.
arXiv Detail & Related papers (2022-11-03T17:55:50Z)
Exact Paired-Permutation Testing for Structured Test Statistics [67.71280539312536]
We provide an efficient exact algorithm for the paired-permutation test for a family of structured test statistics. Our exact algorithm was $10$x faster than the Monte Carlo approximation with $20000$ samples on a common dataset.
arXiv Detail & Related papers (2022-05-03T11:00:59Z)
Adjusted chi-square test for degree-corrected block models [13.122543280692641]
We propose a goodness-of-fit test for degree-corrected block models (DCSBM) We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $d_i$ grows to infinity. Our distributional results are nonasymptotic, with explicit constants, providing finite-sample bounds on the Kolmogorov-Smirnov distance to the target distribution.
arXiv Detail & Related papers (2020-12-30T05:20:59Z)
Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime. We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.