Scalable and adaptive prediction bands with kernel sum-of-squares
- URL: http://arxiv.org/abs/2505.21039v1
- Date: Tue, 27 May 2025 11:21:17 GMT
- Title: Scalable and adaptive prediction bands with kernel sum-of-squares
- Authors: Louis Allain, Sébastien da Veiga, Brian Staber,
- Abstract summary: Conformal Prediction (CP) is a popular framework for constructing prediction bands with valid coverage in finite samples.<n>We build upon recent ideas that rely on recasting the CP problem as a statistical learning problem, directly targeting coverage and adaptivity.
- Score: 0.5530212768657544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conformal Prediction (CP) is a popular framework for constructing prediction bands with valid coverage in finite samples, while being free of any distributional assumption. A well-known limitation of conformal prediction is the lack of adaptivity, although several works introduced practically efficient alternate procedures. In this work, we build upon recent ideas that rely on recasting the CP problem as a statistical learning problem, directly targeting coverage and adaptivity. This statistical learning problem is based on reproducible kernel Hilbert spaces (RKHS) and kernel sum-of-squares (SoS) methods. First, we extend previous results with a general representer theorem and exhibit the dual formulation of the learning problem. Crucially, such dual formulation can be solved efficiently by accelerated gradient methods with several hundreds or thousands of samples, unlike previous strategies based on off-the-shelf semidefinite programming algorithms. Second, we introduce a new hyperparameter tuning strategy tailored specifically to target adaptivity through bounds on test-conditional coverage. This strategy, based on the Hilbert-Schmidt Independence Criterion (HSIC), is introduced here to tune kernel lengthscales in our framework, but has broader applicability since it could be used in any CP algorithm where the score function is learned. Finally, extensive experiments are conducted to show how our method compares to related work. All figures can be reproduced with the accompanying code.
Related papers
- Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z) - Epistemic Uncertainty in Conformal Scores: A Unified Approach [2.449909275410288]
Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture uncertainty.<n>We introduce $texttEPICSCORE$, a model-agnostic approach that enhances any conformal score by explicitly integrating uncertainty.<n>$texttEPICSCORE$ adaptively expands predictive intervals in regions with limited data while maintaining compact intervals where data is abundant.
arXiv Detail & Related papers (2025-02-10T19:42:54Z) - Optimal Transport-based Conformal Prediction [8.302146576157497]
Conformal Prediction (CP) is a principled framework for uncertainty in blackbox learning models.<n>We introduce a novel CP procedure handling prediction score functions through a lens.<n>We then adapt our method for quantifying multi-output regression and multiclass classification.
arXiv Detail & Related papers (2025-01-31T09:48:28Z) - Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.<n>We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z) - An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning [55.467047686093025]
A common approach to alleviate such forgetting is to rehearse samples from prior tasks during fine-tuning.<n>We propose a sampling scheme, textttbf mix-cd, that prioritizes rehearsal of collateral damage'' samples.<n>Our approach is computationally efficient, easy to implement, and outperforms several leading continual learning methods in compute-constrained settings.
arXiv Detail & Related papers (2024-02-12T22:32:12Z) - Quantum Annealing Solutions for the Closest String Problem with D-Wave
Systems [0.0]
Closest String Problem is an NP-complete problem which appears more commonly in bioinformatics and coding theory.
Two QUBO formulations have been proposed, with one being a slight modification over the other.
DWave annealers have been used, while providing guidelines for optimality on certain platform-specific concerns.
arXiv Detail & Related papers (2023-10-19T16:03:25Z) - Stochastic Gradient Methods with Preconditioned Updates [47.23741709751474]
There are several algorithms for such problems, but existing methods often work poorly when badly scaled and/or ill-conditioned.
Here we include preconditionimater based on Hutchinson's approach to approxing the diagonal Hessian.
We prove convergence both when smoothness and the PL condition are assumed.
arXiv Detail & Related papers (2022-06-01T07:38:08Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Oversampling Divide-and-conquer for Response-skewed Kernel Ridge
Regression [20.00435452480056]
We develop a novel response-adaptive partition strategy to overcome the limitation of the divide-and-conquer method.
We show the proposed estimate has a smaller mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions.
arXiv Detail & Related papers (2021-07-13T04:01:04Z) - Root-finding Approaches for Computing Conformal Prediction Set [18.405645120971496]
Conformal prediction constructs a confidence region for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations.
We exploit the fact that, emphoften, conformal prediction sets are intervals whose boundaries can be efficiently approximated by classical root-finding software.
arXiv Detail & Related papers (2021-04-14T06:41:12Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.