Related papers: Ensemble Threshold Calibration for Stable Sensitivity Control

Ensemble Threshold Calibration for Stable Sensitivity Control

URL: http://arxiv.org/abs/2510.02116v1
Date: Thu, 02 Oct 2025 15:22:28 GMT
Title: Ensemble Threshold Calibration for Stable Sensitivity Control
Authors: John N. Daras,
Abstract summary: We present an end-to-end framework that achieves exact recall with sub-percent variance over tens of millions of geometry pairs.<n>Our approach consistently hits a recall target within a small error, decreases redundant verifications relative to other calibrations, and runs end-to-end on a single TPU v3 core.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Precise recall control is critical in large-scale spatial conflation and entity-matching tasks, where missing even a few true matches can break downstream analytics, while excessive manual review inflates cost. Classical confidence-interval cuts such as Clopper-Pearson or Wilson provide lower bounds on recall, but they routinely overshoot the target by several percentage points and exhibit high run-to-run variance under skewed score distributions. We present an end-to-end framework that achieves exact recall with sub-percent variance over tens of millions of geometry pairs, while remaining TPU-friendly. Our pipeline starts with an equigrid bounding-box filter and compressed sparse row (CSR) candidate representation, reducing pair enumeration by two orders of magnitude. A deterministic xxHash bootstrap sample trains a lightweight neural ranker; its scores are propagated to all remaining pairs via a single forward pass and used to construct a reproducible, score-decile-stratified calibration set. Four complementary threshold estimators - Clopper-Pearson, Jeffreys, Wilson, and an exact quantile - are aggregated via inverse-variance weighting, then fused across nine independent subsamples. This ensemble reduces threshold variance compared to any single method. Evaluated on two real cadastral datasets (approximately 6.31M and 67.34M pairs), our approach consistently hits a recall target within a small error, decreases redundant verifications relative to other calibrations, and runs end-to-end on a single TPU v3 core.

Related papers

Sharp Convergence Rates for Masked Diffusion Models [53.117058231393834]
We develop a total-variation based analysis for the Euler method that overcomes limitations.<n>Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees.<n>Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS.
arXiv Detail & Related papers (2026-02-26T00:47:51Z)
SpatialBench-UC: Uncertainty-Aware Evaluation of Spatial Prompt Following in Text-to-Image Generation [0.0]
SpatialBench-UC is a small, reproducible benchmark for pairwise spatial relations.<n>We release a benchmark package, versioned prompts, pinned configs, per-sample checker outputs, and report tables.<n>We evaluate three baselines, Stable Diffusion 1.5, SD 1.5 BoxDiff, and SD 1.4 GLIGEN.
arXiv Detail & Related papers (2026-01-19T23:37:10Z)
One-Shot Federated Ridge Regression: Exact Recovery via Sufficient Statistic Aggregation [0.7106986689736825]
Federated ridge regression is a distributed equilibrium problem where each client computes local sufficient statistics and transmits them once.<n>We establish differential privacy guarantees where noise is injected once per client, eliminating the composition penalty that degrades privacy in multi-round protocols.<n>Experiments on synthetic heterogeneous regression demonstrate that one-shot fusion matches FedAvg accuracy while requiring up to $38times$ less communication.
arXiv Detail & Related papers (2026-01-13T04:47:22Z)
Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations [121.39938773554523]
The Area Under the ROC Curve (AUC) is a pivotal evaluation metric in real-world scenarios with both class imbalance and decision constraints.<n>We present two simple instance-wise minimax reformulations to close the approximation gap of PAUC optimization.<n>The resulting algorithms enjoy a linear per-iteration computational complexity w.r.t. the sample size and a convergence rate of $O(-2/3)$ for typical one-way and two-way PAUCs.
arXiv Detail & Related papers (2025-12-01T02:52:33Z)
Distributed Training under Packet Loss [8.613477072763404]
Leveraging unreliable connections will reduce latency but may sacrifice model accuracy and convergence once packets are dropped.<n>We introduce a principled, end-to-end solution that preserves accuracy and convergence guarantees under genuine packet loss.<n>This work bridges the gap between communication-efficient protocols and the accuracy and guarantees demanded by modern large-model training.
arXiv Detail & Related papers (2025-07-02T11:07:20Z)
Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores.<n>We flexibly estimate the joint cumulative distribution function (CDF) of the scores.<n>Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.<n>We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.<n>We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z)
Optimal Cross-Validation for Sparse Linear Regression [5.156484100374059]
We use k-fold cross-validation to select sparsity and robustness of linear regressors.<n>Cross-validation substantially increases the computational cost of sparse regression.<n>We improve upon this state of affairs by solving 50-80% fewer mixed-integer optimization problems.
arXiv Detail & Related papers (2023-06-26T17:02:45Z)
CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples. CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling. We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z)
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.