CoVeR: Conformal Calibration for Versatile and Reliable Autoregressive Next-Token Prediction
- URL: http://arxiv.org/abs/2509.04733v1
- Date: Fri, 05 Sep 2025 01:07:12 GMT
- Title: CoVeR: Conformal Calibration for Versatile and Reliable Autoregressive Next-Token Prediction
- Authors: Yuzhu Chen, Yingjie Wang, Shunyu Liu, Yongcheng Jing, Dacheng Tao,
- Abstract summary: conformsctextCoVeR is a model-free decoding strategy that balances search efficiency with the need for versatile trajectories.<n>We show that conformsctextCoVeR simultaneously maintains a compact search space and ensures high coverage probability over desirable trajectories.
- Score: 49.09876340754804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autoregressive pre-trained models combined with decoding methods have achieved impressive performance on complex reasoning tasks. While mainstream decoding strategies such as beam search can generate plausible candidate sets, they often lack provable coverage guarantees, and struggle to effectively balance search efficiency with the need for versatile trajectories, particularly those involving long-tail sequences that are essential in certain real-world applications. To address these limitations, we propose \textsc{CoVeR}, a novel model-free decoding strategy wihtin the conformal prediction framework that simultaneously maintains a compact search space and ensures high coverage probability over desirable trajectories. Theoretically, we establish a PAC-style generalization bound, guaranteeing that \textsc{CoVeR} asymptotically achieves a coverage rate of at least $1 - \alpha$ for any target level $\alpha \in (0,1)$.
Related papers
- OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z) - GARDO: Reinforcing Diffusion Models without Reward Hacking [54.841464430913476]
Fine-tuning diffusion models via online reinforcement learning (RL) has shown great potential for enhancing text-to-image alignment.<n>The mismatch often leads to reward hacking, where proxy scores increase while real image quality deteriorates and generation diversity collapses.<n>We propose Gated and Adaptive Regularization with Diversity-aware Optimization (GARDO) to address the competing demands of sample efficiency, effective exploration, and mitigation of reward hacking.
arXiv Detail & Related papers (2025-12-30T10:55:45Z) - Probabilistic Robustness Analysis in High Dimensional Space: Application to Semantic Segmentation Network [6.587910936799125]
We introduce a probabilistic verification framework that is both architecture-agnostic and scalable to high-dimensional outputs.<n>Our approach combines sampling-based reachability analysis with conformal inference (CI) to deliver provable guarantees.<n>We demonstrate that our method provides reliable safety guarantees while substantially tightening bounds compared to SOTA.
arXiv Detail & Related papers (2025-09-15T12:25:25Z) - Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL [6.224756774400233]
We study offline reinforcement learning in average-reward MDPs, which presents increased challenges from the perspectives of distribution shift and non-uniform coverage.<n>We develop sharp guarantees depending only on the target policy, specifically the bias span and a novel policy hitting radius, yielding the first fully single-policy sample complexity bound for average-reward offline RL.
arXiv Detail & Related papers (2025-06-26T00:22:39Z) - COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z) - Rényi security framework against coherent attacks applied to decoy-state QKD [0.0]
We develop a flexible and robust framework for finite-size security proofs of quantum key distribution protocols under coherent attacks.<n>Our approach achieves high finite-size key rates across a broad class of protocols while imposing minimal requirements.
arXiv Detail & Related papers (2025-04-16T16:54:23Z) - Robust Optimization with Diffusion Models for Green Security [49.68562792424776]
In green security, defenders must forecast adversarial behavior, such as poaching, illegal logging, and illegal fishing, to plan effective patrols.<n>We propose a conditional diffusion model for adversary behavior modeling, leveraging its strong distribution-fitting capabilities.<n>We introduce a mixed strategy of mixed strategies and employ a twisted Sequential Monte Carlo (SMC) sampler for accurate sampling.
arXiv Detail & Related papers (2025-02-19T05:30:46Z) - Guaranteed Generation from Large Language Models [28.157857382660563]
Large language models (LLMs) are increasingly used across various applications.<n>We propose GUARD, a simple yet effective approach that combines an autoregressive proposal distribution with rejection sampling.<n>These experiments show that GUARD achieves perfect constraint satisfaction while almost preserving the ideal distribution with highly improved inference efficiency.
arXiv Detail & Related papers (2024-10-09T09:39:55Z) - Scalable Online Exploration via Coverability [45.66375686120087]
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation.
We introduce a new objective, $L_Coverage, which generalizes previous exploration schemes and supports three fundamental desideratas.
$L_Coverage enables the first computationally efficient model-based and model-free algorithms for online (reward-free or reward-driven) reinforcement learning in MDPs with low coverability.
arXiv Detail & Related papers (2024-03-11T10:14:06Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - Safe Exploration Incurs Nearly No Additional Sample Complexity for
Reward-free RL [43.672794342894946]
Reward-free reinforcement learning (RF-RL) relies on random action-taking to explore the unknown environment without any reward feedback information.
It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning.
We propose a unified Safe reWard-frEe ExploraTion (SWEET) framework, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.
arXiv Detail & Related papers (2022-06-28T15:00:45Z) - Selective Classification via One-Sided Prediction [54.05407231648068]
One-sided prediction (OSP) based relaxation yields an SC scheme that attains near-optimal coverage in the practically relevant high target accuracy regime.
We theoretically derive bounds generalization for SC and OSP, and empirically we show that our scheme strongly outperforms state of the art methods in coverage at small error levels.
arXiv Detail & Related papers (2020-10-15T16:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.