Related papers: Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality

Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality

URL: http://arxiv.org/abs/2510.18982v1
Date: Tue, 21 Oct 2025 18:05:42 GMT
Title: Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality
Authors: Arpan Mukherjee, Marcello Bullo, Debabrota Basu, Deniz Gündüz,
Abstract summary: Test-time scaling with verification has shown promise in improving the performance of large language models.<n>The effect of verification manifests through interactions of three quantities: (i) the generator's coverage, (ii) the verifier's region of convergence (ROC), and (iii) the sampling algorithm's sub-optimality.<n>We frame verifiable test-time scaling as a transport problem. This characterizes the interaction of coverage, ROC, and sub-optimality.
Score: 53.03186946689658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While test-time scaling with verification has shown promise in improving the performance of large language models (LLMs), the role of the verifier and its imperfections remain underexplored. The effect of verification manifests through interactions of three quantities: (i) the generator's coverage, (ii) the verifier's region of convergence (ROC), and (iii) the sampling algorithm's sub-optimality. Though recent studies capture subsets of these factors, a unified framework quantifying the geometry of their interplay is missing. We frame verifiable test-time scaling as a transport problem. This characterizes the interaction of coverage, ROC, and sub-optimality, and uncovers that the sub-optimality--coverage curve exhibits three regimes. A transport regime -- where sub-optimality increases with coverage, a policy improvement regime -- where sub-optimality may decrease with coverage, depending on the verifier's ROC, and a saturation regime -- where sub-optimality plateaus, unaffected by coverage. We further propose and analyze two classes of sampling algorithms -- sequential and batched, and examine how their computational complexities shape these trade-offs. Empirical results with Qwen, Llama, and Gemma models corroborate our theoretical findings.

Related papers

SPARTA: $χ^2$-calibrated, risk-controlled exploration-exploitation for variational quantum algorithms [0.0]
Variational quantum algorithms face a fundamental trainability crisis: barren plateaus render optimization exponentially difficult as system size grows.<n>We present the sequential plateau-adaptive regime-testing algorithm (SPARTA) that provides explicit, anytime-valid risk control for quantum optimization.
arXiv Detail & Related papers (2025-11-24T13:54:01Z)
Bifidelity Karhunen-Loève Expansion Surrogate with Active Learning for Random Fields [0.4899818550820576]
We present a bifidelity Karhunen-Loeve expansion (KLE) surrogate model for field-valued quantities of interest (QoIs) under uncertain inputs.<n>We form an active learning strategy that adaptively selects new HF evaluations based on the surrogate's generalization error.<n>New HF samples are then acquired by maximizing an expected improvement criterion, targeting regions of high surrogate error.
arXiv Detail & Related papers (2025-11-05T04:14:44Z)
Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization [53.82400605816587]
Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation.<n>A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios.<n>We introduce Continual AQA (CAQA), which equips with Continual Learning capabilities to handle evolving distributions.
arXiv Detail & Related papers (2025-10-08T10:09:47Z)
Topological Adaptive Least Mean Squares Algorithms over Simplicial Complexes [13.291627429657416]
This paper introduces a novel adaptive framework for processing dynamic flow signals over simplicial complexes.<n>We present a topological LMS algorithm that efficiently processes streaming signals observed over time-varying edge subsets.
arXiv Detail & Related papers (2025-05-29T06:55:19Z)
Fractional Correspondence Framework in Detection Transformer [13.388933240897492]
The Detection Transformer (DETR) has significantly simplified the matching process in object detection tasks.<n>This algorithm facilitates optimal one-to-one matching of predicted bounding boxes to ground-truth annotations during training.<n>We propose a flexible matching strategy that captures the cost of aligning predictions with ground truths to find the most accurate correspondences.
arXiv Detail & Related papers (2025-03-06T05:29:20Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
Verification of Geometric Robustness of Neural Networks via Piecewise Linear Approximation and Lipschitz Optimisation [57.10353686244835]
We address the problem of verifying neural networks against geometric transformations of the input image, including rotation, scaling, shearing, and translation. The proposed method computes provably sound piecewise linear constraints for the pixel values by using sampling and linear approximations in combination with branch-and-bound Lipschitz. We show that our proposed implementation resolves up to 32% more verification cases than present approaches.
arXiv Detail & Related papers (2024-08-23T15:02:09Z)
Secure Hierarchical Federated Learning in Vehicular Networks Using Dynamic Client Selection and Anomaly Detection [10.177917426690701]
Hierarchical Federated Learning (HFL) faces the challenge of adversarial or unreliable vehicles in vehicular networks. Our study introduces a novel framework that integrates dynamic vehicle selection and robust anomaly detection mechanisms. Our proposed algorithm demonstrates remarkable resilience even under intense attack conditions.
arXiv Detail & Related papers (2024-05-25T18:31:20Z)
Efficient Transfer Learning via Causal Bounds [8.981637739384674]
We analyze how causal side-information accelerates online learning, and experiments on data reduction.<n>Our analysis precisely characterizes when how causal side-information accelerates online learning, and experiments on data reduction.
arXiv Detail & Related papers (2023-08-07T13:24:50Z)
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet. We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z)
Comparing Probability Distributions with Conditional Transport [63.11403041984197]
We propose conditional transport (CT) as a new divergence and approximate it with the amortized CT (ACT) cost. ACT amortizes the computation of its conditional transport plans and comes with unbiased sample gradients that are straightforward to compute. On a wide variety of benchmark datasets generative modeling, substituting the default statistical distance of an existing generative adversarial network with ACT is shown to consistently improve the performance.
arXiv Detail & Related papers (2020-12-28T05:14:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.