Related papers: Did You Mean...? Confidence-based Trade-offs in Semantic Parsing

Related papers

Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision [12.287123198288079]
Uncertainty calibration is essential for the safe deployment of large language models (LLMs)<n>We find that supervised fine-tuning with scalar confidence labels alone suffices to elicit self-verification behavior of language models.<n>We propose a simple rethinking method that boosts performance via test-time scaling based on calibrated uncertainty.
arXiv Detail & Related papers (2025-06-04T08:56:24Z)
Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation [26.580361841501514]
Vision-language models (VLMs) excel in various multimodal tasks but frequently suffer from poor calibration. This miscalibration undermines user trust, especially when models confidently provide incorrect or fabricated information. We propose a novel Confidence through Semantic Perturbation (CSP) framework to improve the calibration of verbalized confidence for object-centric queries.
arXiv Detail & Related papers (2025-04-21T04:01:22Z)
Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models [34.59785123314865]
A safe and trustworthy use of Large Language Models (LLMs) requires an accurate expression of confidence in their answers. We introduce a novel Reinforcement Learning (RL) approach for LLM calibration that fine-tunes LLMs to elicit calibrated confidence estimations in their answers to factual questions.
arXiv Detail & Related papers (2025-03-04T13:48:50Z)
Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level. We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z)
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models [14.5291643644017]
We introduce the concept of Confidence-Probability Alignment. We probe the alignment between models' internal and expressed confidence. Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
arXiv Detail & Related papers (2024-05-25T15:42:04Z)
When to Trust LLMs: Aligning Confidence with Response Quality [49.371218210305656]
We propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD) It integrates quality reward and order-preserving alignment reward functions. Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy.
arXiv Detail & Related papers (2024-04-26T09:42:46Z)
Tighter Confidence Bounds for Sequential Kernel Regression [3.683202928838613]
We use martingale tail inequalities to establish new confidence bounds for sequential kernel regression. Our confidence bounds can be computed by solving a conic program, although this bare version quickly becomes impractical. We find that when our confidence bounds replace existing ones, the KernelUCB algorithm has better empirical performance, a matching worst-case performance guarantee and comparable computational cost.
arXiv Detail & Related papers (2024-03-19T13:47:35Z)
U-Trustworthy Models.Reliability, Competence, and Confidence in Decision-Making [0.21756081703275998]
We present a precise mathematical definition of trustworthiness, termed $mathcalU$-trustworthiness. Within the context of $mathcalU$-trustworthiness, we prove that properly-ranked models are inherently $mathcalU$-trustworthy. We advocate for the adoption of the AUC metric as the preferred measure of trustworthiness.
arXiv Detail & Related papers (2024-01-04T04:58:02Z)
A Diachronic Perspective on User Trust in AI under Uncertainty [52.44939679369428]
Modern NLP systems are often uncalibrated, resulting in confidently incorrect predictions that undermine user trust. We study the evolution of user trust in response to trust-eroding events using a betting game.
arXiv Detail & Related papers (2023-10-20T14:41:46Z)
Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification. We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate. We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z)
Trust, but Verify: Using Self-Supervised Probing to Improve Trustworthiness [29.320691367586004]
We introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model. We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner.
arXiv Detail & Related papers (2023-02-06T08:57:20Z)
Confidence-Calibrated Face and Kinship Verification [8.570969129199467]
We introduce an effective confidence measure that allows verification models to convert a similarity score into a confidence score for any given face pair. We also propose a confidence-calibrated approach, termed Angular Scaling (ASC), which is easy to implement and can be readily applied to existing verification models. To the best of our knowledge, our work presents the first comprehensive confidence-calibrated solution for modern face and kinship verification tasks.
arXiv Detail & Related papers (2022-10-25T10:43:46Z)
Binary Classification from Positive Data with Skewed Confidence [85.18941440826309]
Positive-confidence (Pconf) classification is a promising weakly-supervised learning method. In practice, the confidence may be skewed by bias arising in an annotation process. We introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyper parameter.
arXiv Detail & Related papers (2020-01-29T00:04:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.