A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates
- URL: http://arxiv.org/abs/2504.01225v1
- Date: Tue, 01 Apr 2025 22:25:00 GMT
- Title: A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates
- Authors: Gonçalo Gomes, Chrysoula Zerva, Bruno Martins,
- Abstract summary: We propose a simple yet effective strategy for generating and calibrating CLIPScore distributions.<n>Our method effectively detects misaligned words, while providing formal guarantees aligned with desired risk levels.
- Score: 3.902360015414256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study explores current limitations of learned image captioning evaluation metrics, specifically the lack of granular assessment for individual word misalignments within captions, and the reliance on single-point quality estimates without considering uncertainty. To address these limitations, we propose a simple yet effective strategy for generating and calibrating CLIPScore distributions. Leveraging a model-agnostic conformal risk control framework, we calibrate CLIPScore values for task-specific control variables, to tackle the aforementioned two limitations. Experimental results demonstrate that using conformal risk control, over the distributions produced with simple methods such as input masking, can achieve competitive performance compared to more complex approaches. Our method effectively detects misaligned words, while providing formal guarantees aligned with desired risk levels, and improving the correlation between uncertainty estimations and prediction errors, thus enhancing the overall reliability of caption evaluation metrics.
Related papers
- Conformal Segmentation in Industrial Surface Defect Detection with Statistical Guarantees [2.0257616108612373]
In industrial settings, surface defects on steel can significantly compromise its service life and elevate potential safety risks.
Traditional defect detection methods predominantly rely on manual inspection, which suffers from low efficiency and high costs.
We develop a statistically rigorous threshold based on a user-defined risk level to identify high-probability defective pixels in test images.
We demonstrate robust and efficient control over the expected test set error rate across varying calibration-to-test ratios.
arXiv Detail & Related papers (2025-04-24T16:33:56Z) - Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction [0.0]
We propose a model-agnostic uncertainty quantification method that integrates dynamic threshold calibration and cross-modal consistency verification.
We show that the framework achieves stable performance across varying calibration-to-test split ratios, underscoring its robustness for real-world deployment in healthcare, autonomous systems, and other safety-sensitive domains.
This work bridges the gap between theoretical reliability and practical applicability in multi-modal AI systems, offering a scalable solution for hallucination detection and uncertainty-aware decision-making.
arXiv Detail & Related papers (2025-04-24T15:39:46Z) - SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU)
We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level.
Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z) - Conditional Conformal Risk Adaptation [9.559062601251464]
We develop a new score function for creating adaptive prediction sets that significantly improve conditional risk control for segmentation tasks.
We introduce a specialized probability calibration framework that enhances the reliability of pixel-wise inclusion estimates.
Our experiments on polyp segmentation demonstrate that all three methods provide valid marginal risk control and deliver more consistent conditional risk control.
arXiv Detail & Related papers (2025-04-10T10:01:06Z) - Risk-Calibrated Affective Speech Recognition via Conformal Coverage Guarantees: A Stochastic Calibrative Framework for Emergent Uncertainty Quantification [0.0]
Traffic safety challenges arising from extreme driver emotions highlight the urgent need for reliable emotion recognition systems.<n>Traditional deep learning approaches in speech emotion recognition suffer from overfitting and poorly calibrated confidence estimates.<n>We propose a framework integrating Conformal Prediction (CP) and Risk Control,using Mel-spectrogram features processed through a pre-trained neural network.
arXiv Detail & Related papers (2025-03-24T12:26:28Z) - Rectifying Conformity Scores for Better Conditional Coverage [75.73184036344908]
We present a new method for generating confidence sets within the split conformal prediction framework.<n>Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact marginal coverage.
arXiv Detail & Related papers (2025-02-22T19:54:14Z) - ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees [68.33498595506941]
We introduce a novel uncertainty measure based on self-consistency theory.
We then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm.
Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2024-06-29T17:33:07Z) - Decoupling of neural network calibration measures [45.70855737027571]
We investigate the coupling of different neural network calibration measures with a special focus on the Area Under Sparsification Error curve (AUSE) metric.
We conclude that the current methodologies leave a degree of freedom, which prevents a unique model for the homologation of safety-critical functionalities.
arXiv Detail & Related papers (2024-06-04T15:21:37Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification.
We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate.
We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z) - B-BACN: Bayesian Boundary-Aware Convolutional Network for Crack
Characterization [4.447467536572625]
Uncertainty of crack detection is challenging due to various factors, such as measurement noises, signal processing, and model simplifications.
A machine learning-based approach is proposed to quantify both uncertainty and aleatoric uncertainties concurrently.
We introduce a Boundary-Aware Convolutional Network (B-BACN) that emphasizes uncertainty-aware boundary refinement to generate precise and reliable crack boundary detections.
arXiv Detail & Related papers (2023-02-14T04:50:42Z) - Approximate Conditional Coverage via Neural Model Approximations [0.030458514384586396]
We analyze a data-driven procedure for obtaining empirically reliable approximate conditional coverage.
We demonstrate the potential for substantial (and otherwise unknowable) under-coverage with split-conformal alternatives with marginal coverage guarantees.
arXiv Detail & Related papers (2022-05-28T02:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.